Recently I’ve been wondering about how different Premier League teams attacking styles relate to each other and what the best method to find relationships between them is. Cluster analysis is one way to group data points based few characteristics (e.g. passes attempted, shots attempted, etc.) that describe each data point. Cluster analysis attempts to find underlying relationships between different subsets of data points. It creates plots that to some extent look like family trees, data points that score similarly across all of their characteristics are grouped closely. Data points that are very dissimilar to each other are grouped far apart. It provides a nice way to visualise at a glance how closely teams are related across a large number of characteristics.
I’ll keep this description short as I’ll update this blog in a couple of days with a more detailed methods section. If you want details of the actual clustering analysis scroll to the bottom of this article. When choosing the attacking characteristics to use for my analysis I took a different approach to some very good analytics articles I’ve read recently that take very detailed match stats into account (e.g. pass completions in each section of the pitch and exact shot locations – http://www.optasportspro.com/about/optapro-blog/posts/2016/blog-grouping-team-styles/ and https://fotbollssiffror.wordpress.com/2017/05/15/clustering-shot-chains-different-types-of-chances-and-how-much-they-are-worth/). Instead I wondered how much we could learn from really basic team attacking summary stats alone. I also didn’t want the analysis to pick up on teams that were ‘good’ or ‘bad’ at attacking purely because I was using outcome stats so I didn’t use stats that reflected success rates (e.g. shots on target, successful passes, goals, assists). Instead the characteristics I used were based on the way teams attempted to play football, the stats I used were per game averages of; Long Shots (outside box), Near Shots (inside box), Long Passes, Short Passes, Wing play (how much of the teams attack focus is on width), Dribbles, Crosses, Open Play Shots and Set Piece Shots. Also I didn’t have the attempted through balls stat in my data set otherwise I’d have included it! The stats I used here were correct as of the World Cup qualifier second leg international break.
In the plot above it’s clear to see that there are two main, opposed branches of the family tree that you could lazily lump into ‘The top 6 + Southampton’ and ‘Everyone else’. What is also clear thought is that there are a number of smaller clusters of teams within these two large branches. I’ve summarised these below as heatmaps (red colours = a team does those above average, blue = a team does this below average). I’ve also provided the probability-based certainty (%) that each cluster is a good summary of the teams within it, this is based on the approximate unbiased bootstrapped p value.
Cluster 1 (96% certainty)- Teams who focus on short passing, high number of dribbles, less focus on wing play, fewer crosses, more diverse shot types and locations
This first cluster is made up of teams that are similar in that they predominantly play play short passes, attempt dribbles (not so much Arsenal) and generate most of their shots from open play. Interestingly Arsenal and Chelsea seem to focus attacks down the centre of the pitch (and don’t put in many crosses) whilst Liverpool and Man City are more balanced between the wings and centre of the pitch. All of the teams within this cluster also attempt long range shots and generate a decent number of efforts from set pieces so they all are pretty diverse with regards to shot type and location.
Cluster 2 (77% certainty)- Teams who attempt a lot of crosses, high number of dribbles, more focus on wing play, more diverse passing types
This cluster is slightly less-well defined than other clusters but the teams here differ from cluster 1 in that they have slightly more balance between long passes and short passes attempted, they are more willing to mix up their play. Teams in this cluster are also similar in that they attempt an above average number of crosses per game, Spurs are interesting in that they try more long range efforts and shots from set pieces showing they have a slightly more versatile attack. Spurs would probably belong to cluster 1 but they have more of a focus on attempting dribbles and crosses than teams in cluster 1.
Cluster 3 (99% certainty)- teams who attempt more long balls, focus play on wings, don’t generate many shots from open play but don’t attempt shots from long range
Teams in this cluster have a very defined preferred style of play, generally they attempt an above average amount of long balls and below average amount of short passes. Most of the teams in this cluster apart from Burnley also focus their attacking down the centre of the pitch. The fact that they show below average stats for long range and close range shots reflects the fact that they don’t attempt many shorts per game in general but, unlike every other team in this cluster, they are above average for generating shots from set pieces.
Cluster 4 (100% certainty) – Teams who focus on wing play , attempt a lot of dribbles, slightly more willing to shoot from range than cluster 3
The teams in cluster 4 are similar to teams in cluster 3 except that they don’t try quite as many long balls per game. Similar to cluster 3 teams here generally like to focus their attack down the wings but they attempt more dribbles per game.
Cluster 5 (98% certainty) – Teams who focus attack down the centre, do not attempt dribbles
Clusters with 2 teams are quite difficult to relate to the other clusters but the gist here is that both Newcastle and Everton are similar in that they are quite well balanced between attacking on the wings and through the centre of the pitch and Everton have a slight bias towards playing long balls compared to other teams whilst Newcastle attempt more shots from range than average.
Cluster 6 (76% certainty) – Teams without a consistent attacking plan
The final cluster isn’t that well defined but that pretty much matches the attacking approach of both managers. Interestingly Huddersfield are 20th and Bournemouth are 18th in the table for ‘Big Chances missed’ that either means they are both quite clinical or reflects the fact that they don’t create many chances. The heat map showing that both teams are below average for long shots and close shots attempted suggests the latter. Compared to most teams it is striking that Huddersfield and Bournemouth produce very few shots from set pieces.