Wednesday, 6 October 2021

Benford's Law Posts - Back From A Break With May's Results

This follows the three previous posts.

I was better at remembering to add the daily article in May, adding articles on 29 of 31 days.

Looking at May's articles only, 313 leading digit numbers were used (10-11 per day, slightly more than April, about the same as March and less than February).

3 is appearing the expected percentage of times. 1 and 7 are the most different to their expected values wth 1 being over-represented and 7 under-represented. If you add together the sum of all the values of (observed-expected)squared, all divided by the expected, the calculated test statistic is 6.67, slightly higher than April.

The critical chi squared value for 9 items with only one line is ~ 15.507

The test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford's Law.

If we look at the rolling total from February to the end of May, there have been 1254 numbers with leading digits.

2 and 3 are the numbers closest to their expected values. 1 is the number furthest away from its expected value and remains over-represented, the next furthest away is 6 which is under-represented. If you add together the sum of all the values of (observed-expected) squared, all divided by the expected, the calculated test statistic is 2.84.

The critical chi squared value for 9 items with only one line is ~ 15.507

The test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford’s Law.

Interestingly, as more numbers from articles added you would expect the calculated test statistic to reduce.  Previously, it has (February = 8.6, February + March = 3.49, February + March + April = 2.29), but the test statistic has increased this time to 2.84, possibly explained by the articles from the 1st, 7th and 8th of May being very skewed towards the number 1 and having a lot of numbers in them.

No comments:

Post a Comment