Thursday 9 February 2023

Benford's Law - From February 2021 to the end of August 2021

I never actually drop projects, I just don't update them for a while. So let us return to the Benford's Law project, with information about the first digits in the top news article on the BBC website on 26 out of the 31 days of August 2021. 

In those 26 articles, there were 398 numbers with leading digits. That's ~ 15 per day, which about the same as June, but more than July. 

 Most of those numbers came from the article on the 8th of August (https://www.bbc.co.uk/sport/olympics/58112331) which was about the performance of different sports at the Tokyo Olympics compared to their funding. August-only 

No number appeared exactly as often as expected, 5 was the closest, but even that was 1% away from expected. 1 and 2 are the most different to their expected values, both are over-represented. 

If you add together the sum of all the values of (observed-expected)squared, all divided by the expected, the calculated test statistic is 8.5, the highest since February itself. 

The critical chi squared value for 9 items with only one line is ~ 15.507 The test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford's Law. 

If we look at the rolling total from February to the end of August, there have been 2258 numbers with leading digits. February-to-August 

No number exactly its expected value, 5 is the closest. 1 is the number furthest away from its expected value and remains over-represented. 

If you add together the sum of all the values of (observed-expected) squared, all divided by the expected, the calculated test statistic is 3.00, not reducing the way it should do with the addition of more first digits that obey Benford's Law. However, as the critical chi squared value for 9 items with only one line is ~ 15.507, the test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford’s Law. 

The test statistic continues to fluctuate rather than reduce which is interesting.

No comments:

Post a Comment