Wednesday 20 April 2022

Benford's Law - From February 2021 to the end of July 2021

Today's post was supposed to be about cycling, and withdrawals from the Giro Rosa/Giro d'Italia Femminile compared to withdrawals in the men's Tour de France, but it requires more prose than I am presently capable of (running fencing competitions takes it out of you). 

Instead, let us return to an update to the Benford's Law project which has been chugging along in the background. 

In July, I recorded the first digits in the top news article on the BBC website on 25/31 days. In those 25 articles, there were 261 numbers with leading digits. That's 10-11 per day, which is a less than February but the same as March and May.

July numbers - 

  Azzeqb.png 

No number appeared exactly as often as expected, 8 was the closest, only 0.1% away from expected. 

1 and 7 are the most different to their expected values with 1 being over-represented and 7 under-represented. 

If you add together the sum of all the values of (observed-expected)squared, all divided by the expected, the calculated test statistic is 3.6, the lowest monthly total so far. 

The critical chi squared value for 9 items with only one line is ~ 15.507 The test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford's Law. 

If we look at the rolling total from February to the end of June, there have been 1860 numbers with leading digits.

Rolling total numbers

  Azz9IX.png 

No number exactly its expected value. 1 is the number furthest away from its expected value and remains over-represented. 

If you add together the sum of all the values of (observed-expected) squared, all divided by the expected, the calculated test statistic is 2.45, reducing as it should with more numbers. 

The critical chi squared value for 9 items with only one line is ~ 15.507 The test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford’s Law.

This is a reduction from the test statistic of the total to May, but it's not as low as it was in April.

No comments:

Post a Comment