Friday, 22 May 2026

Benford's Law - Refresh and month 1

While writing the F1 scrutineering posts, I realised that was the exact testing I needed to be doing for my Benford's Law project (https://fulltimesportsfan.wordpress.com/2021/03/17/obey-benfords-its-the-law-an-introduction-to-my-benfords-law-project/). 

This makes it an excellent opportunity to redo that project, but better, and to finalise it. 

The Benford's law project focussed on the leading digit of all numbers in the lead articles for one year of BBC.com front pages. 

It began in February 2021. 

The 28 daily news articles contained 436 numbers written as numbers (~ 15 per day). 

The data looks like this: Bar chart of the observed number of appearances by a leading digit compared to expected, where expected is described by a standardised residual.  One is massively over represented with a standardised residual of 4.7 Calculated, it's X² = 37.434 
df = 8 
p-value = 9.576 × 10⁻⁶ 

The difference between the expected and the observed is statistically significant. 

Therefore, the leading digits do not obey Benford's law. 

Obviously, this is just one month's worth of data. Most of the deviation comes from the digits 1 and 2. 

1 is massively over-represented (with a standardised residual of 4.7) and 2 is underrepresented (standardised residual of -2.4). 3, 4, 7 and 8 are present as often as they are expected, while 5, 6 and 9 are slightly under-represented, with 6 being significantly under-represented (-2.07).

Further reports to follow (I make no promise on time line, the World Cup and the Tour de France will keep me busy).

Saturday, 16 May 2026

Are the scrutineers picking on my driver? - Update after the Miami Grand Prix

The Miami Grand Prix itself:Bar chart of driver numbers on the y-axis compared to standardised residual levels for the Miami Grand Prix along the x-axis.  The driver the most over-checked compared to expected is Leclerc (car 16), with a value of 0.52 compared to expected.  Cars 6 and 30 (Hadjar and Lawson) were the cars tested least compared to expected, with a value of -0.35 compared to expected.  Cars 1, 5, 18, 41, 55, 63 and 77 were tested exactly as much as expected. 

So yes, they are picking on my driver, that's the only reason Leclerc (car 16) could be the most over-tested driver compared to expected. 

More sensibly, none of those numbers are significantly away from expected. There was no "extra check of a random top 10 finisher", even though there were more than 15 finishers. So the reason for the check isn't there being 15 or more finishers - that's one theory ruled out. Maybe they only do the test every other race? At least that's a testable hypothesis. 

The season to date: Bar chart of driver numbers on the y-axis compared to standardised residual levels for the season to date along the x-axis.  The driver the most over-checked compared to expected is Russell (car 63), with a value of 0.51 compared to expected.  Cars 6 (Hadjar) is the car tested least compared to expected, with a value of -0.51 compared to expected.  Cars 1, 5, 87 (Norris, Bortoleto and Bearman) were tested exactly as much as expected. Over the season so far, Russell's car (63), is the most over-tested compared to expected. Which makes sense given he's finished all 4 races, and quite highly in all of them. 

Car 6, Hadjar, is the least tested compared to expected. It makes some sense, he's not finished two of the races, but he's not alone in that. Possibly it's just a quirk of small numbers, because there have only been 136 expected tests. 

1, 55 and 87 (Norris, Sainz jnr and Bearman) have been tested exactly as much as expected. That's fewer cars than after the Japanese Grand Prix, but I think that's because there were a lot of people had incidents during the race so there was more variation in the number of tests. 

None of the differences are statistically significant. 

After 4 races in of potentially 22, no drivers have been significantly over-tested, but the pattern of how close to expectation the number of checks is over the season varies, possibly as a function of small numbers at this time.

Saturday, 9 May 2026

Formula 1 2026 - Miami Grand Prix

Before the race: 

To have one car burst into flame is unfortunate, two smacks of carelessness, Audi. 

I have been caught in a US thunderstorm before (Gulf of Mexico in my case), I do not blame them for moving the race. Y'all have excessive weather. 

The race itself: 

It all kicked off at the start, didn't it. 

Hadjar Vs his car is one of *the* pictures of sports frustration. Photo of Isack Hadjar, F1 driver, angry that he crashed.  He is still seated in the car but his arms are raised in frustration. 

I sympathiese with Sainz jnr's complaints about Verstappen's aggressive overtaking style. Personal opinion, Verstappen's going to keep doing it until someone counter-bulldozes. Complaining to the stewards doesn't work. 

Sky and BBC both suggested that the way stewards investigations happened at the Miami Grand Prix made it feel like there is one rule for the top of the field and one rule for everyone else when it comes to the timing of penalty decisions. I have tremendous sympathy for the impossible position the stewards are in, and that there's too much going on for them to make decisions on everything at the time. A simple solution might be if they just make a decision that they're going to review everything after the end of the race or maybe incidents after half way through the race will be reviewed after the end of the race. If they stick to a rule like that, people wouldn't complain about inconsistency (or not about inconsistency around this). 

Even amidst the small changes around the rules, one thing doesn't change - Ferrari's strategy causing a driver to have a breakdown on radio. At some point, even a Magic Eightball would do better. Or, the way Andrew Benson phrased it on BBC Radio [slight paraphrase] - "Ferrari's strategy is a persistent mystery to most in the paddock". 

I think that Leclerc should get bonus points for not crashing at the end, not a time penalty, but this is why I am not allowed to be a steward. 

My opinion on the tweaks to the tech regs: 

These changes have been made in the middle of a season, so there is no way they could have been large changes. Verstappen wanted much larger changes and was always going to complain when he didn't get them. Part of Verstappen's problem isn't the regs, it's that Mercedes got the regulations right and designed a car that works under them and that McLaren have been able to catch up a lot more quickly than Red Bull have been able to. While there are reasonable complaints to be made about the new regs, they gave us an interesting Miami Grand Prix, which I thought was impossible.

Tuesday, 5 May 2026

Are the scrutineers picking on my driver? - Update after the Japanese Grand Prix

The Japanese Grand Prix itself: Bar chart of driver numbers on the y-axis compared to standardised residual levels for the Japanese Grand Prix along the x-axis.  The driver the most over-checked compared to expected is Piastri, car 81, with a value of 0.51 compared to expected.  Next is a cluster of 4 drivers, 63 - Russell, 44 - Hamilton, 10 - Gasly and 5 - Bortoleto on 0.34. 9 cars were tested exactly the amount expected.  Car 87 - Bearman was the car tested least vs. expected at -0.51, followed by car 18, Stroll, on -0.34. Further to my theory that not finishing means less testing, once Piastri (car 81) finishes a race, he is the most tested car in that race. 

With 20 finishers, the bonus testing of one of the top 10 finishers happened. This time it was Lewis Hamilton, car 44. This is more than 15 so that fits in with my theory from the Chinese Grand Prix post. 

Bearman (car 87), one of the two non-finishers, was the least tested car. Stroll, car 18, the other non-finisher, is the second least tested. This supports the theory that non-finishers get tested less. 

As with the previous races, none of the differences are statistically significant. 

Season to date up until the end of the Japanese Grand Prix Bar chart of driver numbers on the y-axis compared to standardised residual levels for the season, up to the Japanese Grand Prix, along the x-axis.  The driver the most over-checked compared to expected is Russell, car 63, with a value of 0.59 compared to expected.  There are a cluster of 7 cars with exactly the expected number of tests.  Car 14, Alonso, is the least tested compared to expected (with -0.59). Russell (car 63) is the most tested compared to expected with a standard residual of 0.59. 

Alonso (car 14) is the least tested compared to expected, possibly because of how few laps that Aston Martin has finished this season. 

One good race has catapulted Piastri from the most under-measured to the middle of the pack. This demonstrates why I think the numbers will tend towards the expected as the season progresses, except for possibly the drivers who are frequently in the top 10 and those whose cars do not finish. 

None of the differences are statistically significant, and now 7 drivers are exactly on expected, fitting in with the theory that the numbers will converge as the season progresses.