Data Analysis: Facts About Soccer
Every four years, a flurry of activity and joyful fanaticism descend upon a designated host nation to witness what is arguably the biggest sporting event in the world. Yes, we’re talking about the FIFA World Cup! And with it in full effect, what better opportunity than such a world-renowned competition to showcase the power of Gigasheet!
As with any mass spectator sport, numbers rule; that is, data and more data. From players and match analyses, to the number of committed fans populating entire stadiums brandishing unrelenting allegiance to their favorite team, football, a.k.a. soccer, naturally encourages gaggles of statistical inference and forecasting.
However, for all its gilded touches, football is also beset by the sort of predictability that comes with monetization. For example, in a recently-published paper by The Royal Society, the four major European leagues (England, Germany, Portugal and Spain) exhibited a clear evolution towards inequality based on financial indicators—a gradual process that the study describes as the “gentrification” of football.
But, to the fun part: The paper seemingly highlights the importance of home-court advantage, and quantifies match scores for several leagues across the world (since 2000 through 2016) according to it. Let’s use Gigasheet to help us get some clarity and confirm whether home teams can still exhibit a certain advantage despite any other biases.
Exploring the Football (Soccer) Data With Gigasheet
Loading and parsing datasets onto Gigasheet takes mere seconds, as the tool is able to handle billions of records at a time, so 200K rows is basically a walk in the park. Soon enough, our table looks like this:
Data in this table are laid out as follows:
Match date (DATE).
Home team (HT).
Away team (AT).
Home team score (HS).
Away team score (AS).
Goals difference (GD = HS - AS).
A single variable indicating whether the final score ended in a win, draw, or loss (WDL) for the home team.
For instance, in the table above we can observe a match that took place on December 31st, 2016, between Chelsea and Stoke City—of the British Premier League (ENG1)—that ended in a win for the home team, 4 to 2.
Now, let’s start drawing some conclusions from our data, beginning with some basic ones such as scoreless matches—for that, we create a filter, setting both HS and AS to zero:
Out of these 18,707 matches ending in a scoreless draw, we quickly turn to the Group feature (on the LGE column) to see a distribution of these scores according to individual leagues:
Immediately, we can observe the top five leagues having matches ending in a 0-0 draw, as well as the total number of matches (per league) meeting this criterion: