top of page
horizontal lines
Gigasheet Primary logo
  • Gianni Perez

Data Analysis: Facts About Soccer

Every four years, a flurry of activity and joyful fanaticism descend upon a designated host nation to witness what is arguably the biggest sporting event in the world. Yes, we’re talking about the FIFA World Cup! And with it in full effect, what better opportunity than such a world-renowned competition to showcase the power of Gigasheet!

As with any mass spectator sport, numbers rule; that is, data and more data. From players and match analyses, to the number of committed fans populating entire stadiums brandishing unrelenting allegiance to their favorite team, football, a.k.a. soccer, naturally encourages gaggles of statistical inference and forecasting.

However, for all its gilded touches, football is also beset by the sort of predictability that comes with monetization. For example, in a recently-published paper by The Royal Society, the four major European leagues (England, Germany, Portugal and Spain) exhibited a clear evolution towards inequality based on financial indicators—a gradual process that the study describes as the “gentrification” of football.

But, to the fun part: The paper seemingly highlights the importance of home-court advantage, and quantifies match scores for several leagues across the world (since 2000 through 2016) according to it. Let’s use Gigasheet to help us get some clarity and confirm whether home teams can still exhibit a certain advantage despite any other biases.

Fun Facts about Soccer: Big Data Analysis

Exploring the Football (Soccer) Data With Gigasheet

Loading and parsing datasets onto Gigasheet takes mere seconds, as the tool is able to handle billions of records at a time, so 200K rows is basically a walk in the park. Soon enough, our table looks like this:

Soccer Data loaded into Gigasheet

Explore This In Gigasheet Here! ↗️ No Sign Up Required.

Data in this table are laid out as follows:

  • Season (SEA).

  • League (LGE).

  • Match date (DATE).

  • Home team (HT).

  • Away team (AT).

  • Home team score (HS).

  • Away team score (AS).

  • Goals difference (GD = HS - AS).

  • A single variable indicating whether the final score ended in a win, draw, or loss (WDL) for the home team.

For instance, in the table above we can observe a match that took place on December 31st, 2016, between Chelsea and Stoke City—of the British Premier League (ENG1)—that ended in a win for the home team, 4 to 2.

Now, let’s start drawing some conclusions from our data, beginning with some basic ones such as scoreless matches—for that, we create a filter, setting both HS and AS to zero:

Using Gigasheet to Filter Soccer Data

Soccer Data Analysis Results

Out of these 18,707 matches ending in a scoreless draw, we quickly turn to the Group feature (on the LGE column) to see a distribution of these scores according to individual leagues:

Using Gigasheet to Group Soccer Stats Data

Immediately, we can observe the top five leagues having matches ending in a 0-0 draw, as well as the total number of matches (per league) meeting this criterion: