Wednesday, 30 June 2021

Euro 2020 - Quarterfinal Network Diagrams

We're down to the last 8.

There's a tight group of England and Spain (I blame Kieran Trippier), almost joined by Belgium, and then everyone else is much more dispersed.


Italy are the national team closest to the centre, with Atalanta the club closest to the centre.

With the exit of France, Portugal and Germany, several of the bigger club teams have lost many players.  The team with the most players represented is now Chelsea with 7, followed by Slavia Prague, Napoli and Juventus with 5.

Somehow, there are 13 communities, for 8 teams.  Despite England and Spain being 1 community.  A few random club teams are coloured differently to the rest of their "group" for reasons I do not know or understand.  One day, I will understand the community algorithm.  Today, once more, is not that day.




Friday, 25 June 2021

Euro 2020 - Second Round Network Data Viz

First, a round up of my predictions for who would be eliminated in the group stages, as *someone* doubted the utility of the diagrams.  My prediction was that "Finland, Russia, Ukraine and North Macedonia to be amongst the 8, with some of the remaining 4 coming from Wales, Turkey, Hungary, Portugal and the Czech Republic", so that's 5/8 correct, including Turkey, who more respected pundits had as alleged dark horses.

Yes, I make the diagrams because they look pretty and making them brings me great joy but the "outlying teams are eliminated first" theory has now held up over several tournaments.

With the removal of 8 teams, what does the diagram look like now?

Unlabelled 

and

labelled

The club teams with the most players represented are Chelsea and Manchester City with 14 followed by Bayern Munich with 13.  After the first round both Chelsea and Bayern have lost 1 player.  No Manchester City players were eliminated.

Juventus are the club closest to the centre, with Spain, just about, being the national team closest to the centre.

The communities view is interesting.



Although there are 16 teams left in, there are only 13 communities, with Germany + France, Spain + England, Austria + Switzerland forming multi-team communities.

As for predictions, from the layout of the network diagram, things do not look good for Wales, Austria, Czech Republic, Croatia and Ukraine.  England vs Germany, France vs Switzerland and, to a lesser extent, Belgium vs Portugal are too close to call, working only from the network diagram.  

As a football fan, I am looking forward to those last two matches a lot.

Wednesday, 16 June 2021

Euro 2020 Interconnectivity Diagrams - Mid-Group Stage Round-Up

While other players can't be replaced during the tournament, goalkeepers, because of their importance*, can be.  Unfortunately, three goalkeepers have been injured and have had to be replaced.  The three play for Switzerland, Czech Republic and England.

Other than removing the only Montpellier player, the main change to the network is that Wales are moved further out, and Czech Republic are moved further in.

Germany and Atalanta remain the teams closest to the centre.  Chelsea remain represented by the most players, 15, followed by Manchester City and Bayern Munich on 14.

There remain 20 communities from 24 teams, but the multi-team groups are now, Austria and Switzerland, England and Spain, Germany and France and Sweden and Denmark.  That's probably because the Swiss goalie, Gregor Kobel, who has come in plays for VfB Stuttgart, whose only other representatives play for Austria and North Macedonia, pulling Switzerland closer to Austria. 


* I am fully paid up member of the Union of Goalkeepers and Associated Trades.  Ask me why you shouldn't block hockey balls with your instep.

Sunday, 13 June 2021

Euro 2020 Interconnectivity Diagrams

 I'll begin with the unlabelled figure, because the labelled figure is very busy.  

The colours are my attempt at replicated the official UEFA Euro 2020 colours, as best I can with the limited Gephi colour palette.  

It is not the prettiest of this type of figure I've produced.  I particularly don't like the cluster right in the middle of the diagram (between Germany and Bayern Munich in the labelled figure).

I did warn you it was busy.  That busyness, the tightness of the clustering, was what really jumped out at me.  I've been making these diagrams since Euro 2012, and this is by far the most tightly inter-connected.

Looking back at the most recent international tournament before this (World Cup 2018), although that too was very busy, it wasn't as busy.  There was at least some space between most teams and some clearly outlying teams.

This time, the nearest there are to outlying teams are Finland, Russia, Ukraine and North Macedonia.  Wales and Turkey are not as outlying, but would be in the next layer in, followed by Portugal, Hungary and the Czech Republic.

Germany are the national team closest to the centre, while Atalanta are the club team closest to the centre.  Chelsea are represented by the most players, 15, followed by Manchester City and Bayern Munich on 14.

A couple of notes: no, you're not imagining it, Spain, Poland and the Netherlands have slightly smaller squads that the other teams.  Spain chose to only take 24 players, while Poland decided not to replace Arkadiusz Milik when he had to withdraw injured.  The Netherlands made the same choice when Donny van de Beek withdrew injured.

Every team except Wales have at least 1 player playing in the home league.  All the teams have at least one player playing in a foreign league.  That is a first, normally it's England who don't have any, but even they have 3 non-England-based players this time.  This may be the cause of the increased inter-connectedness of the diagram, although the number of non-English players in the Premier League has always been high.  Another cause could be the increase in squad size to 26 from 23 (so 72 extra players had everyone used the full 26 players).

Looking at the community view, there are 20 communities but 24 teams.  



France, Germany and Switzerland form one community, as do England and Spain, and Sweden and Denmark.  This again demonstrates how tightly interconnected parts of the diagrams are.

This is probably due to the number of club teams their national team players share.

Historically, teams that are outlying do less well.  With the way the tournament is organised, 8 teams are sent home after the first round.  From this diagram, I would expect Finland, Russia, Ukraine and North Macedonia to be amongst the 8, with some of the remaining 4 coming from Wales, Turkey, Hungary, Portugal and the Czech Republic (I didn't expect the last two).  

Two caveats, firstly, this prediction was made from the diagram only, and has ignored what has gone in the first two days of matches.  Secondly, it doesn't take the groups into account, so I know no one from group E is included but more than 2 groups have 2 teams named.  I am reasonably confident in the prediction about Finland, Russia, Ukraine and North Macedonia, a lot less confident about the others.  The lack of separation, due to the tight clustering, makes it hard to spot outliers.

Wednesday, 9 June 2021

F1 2021 - Azerbaijan Grand Prix

The Baku chaos bonanza was definitely open for business on Sunday.  If nothing else, *something* always happens at the Azerbaijan Grand Prix.

I could have done with fewer tyre blow-outs.  I'm with Mark Webber, both the accidents could have been a lot worse, very easily.

Discussing Aussie Grit, he treated us to yet more amusing weird commentary noises.  I think the moment deserved that shriek.  Because, yes, by mistakes like Hamilton's are Driver's World Championships won and lost.

It was an amazing race for swings in Championship lead, and there was actual overtaking.  Non-DRS overtaking at that.

From an infinitely more biased Ferrari perspective, the good (and the cookie) come from the pole position.  If only the car had performed in the race.  I know I'm greedy, but when there's a pole position, one hopes for a podium place at least.



Thursday, 3 June 2021

Do April's lead articles obey Benford's Law? And how does the running total look?

This is the results of the third month of monitoring news articles for which numbers they contain.

I missed a couple more days in April, I blame Easter, and I will catch these up at the end of the year.

In the 27 days I did manage to capture, 232 numbers were used in the leading news articles on bbc.co.uk (~ 8 to 9 per day).  This is slightly less than the 9-10 in March and a lot less than the 15 per day from February.


9 is the number closest to its expected value.  2 is over-represented, 8 is under-represented. If you add together the sum of all the values of (observed-expected)squared, all divided by the expected, the calculated test statistic is 5.7.

The critical chi squared value for 9 items with only one line is ~ 15.507

The test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford's Law.

If you look at the rolling total of February to the end of April, the numbers are starting to add up.  Since the start of February, there have been 941 digits in headline news articles.


5 is the number closest to its expected value.  1 remains over-represented, while 6 is under-represented. If you add together the sum of all the values of (observed-expected)squared, all divided by the expected, the calculated test statistic is 2.29.

The critical chi squared value for 9 items with only one line is ~ 15.507

The test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford's Law.

Interestingly, as more numbers from articles have been added the calculated test statistic has reduced (February = 8.6, February + March = 3.49, February + March + April = 2.29).  This is what you would expect to see if the numbers in the articles fulfill Benford's law.