Nicholas Steed/

Soccer Data


Soccer Data

This dataset contains data of every game from the 2018-2019 season in the English Premier League.

Not sure where to begin? Scroll to the bottom to find challenges!

import pandas as pd


Data Dictionary

DivDivision the game was played in
DateThe date the game was played
HomeTeamThe home team
AwayTeamThe away team
FTHGFull time home goals
FTAGFull time away goals
FTRFull time result
HTHGHalf time home goals
HTAGHalf time away goals
HTRHalf time result
RefereeThe referee of the game
HSNumber of shots taken by home team
ASNumber of shots taken by away team
HSTNumber of shots taken by home team on target
ASTNumber of shots taken by away team on target
HFNumber of fouls made by home team
AFNumber of fouls made by away team
HCNumber of corners taken by home team
ACNumber of corners taken by away team
HYNumber of yellow cards received by home team
AYNumber of yellow cards received by away team
HRNumber of red cards received by home team
ARNumber of red cards received by away team

Source of dataset.

Don't know where to start?

Challenges are brief tasks designed to help you practice specific skills:

  • 🗺️ Explore: What team commits the most fouls?
  • 📊 Visualize: Plot the percentage of games that ended in a draw over time.
  • 🔎 Analyze: Does the number of red cards a team receives have an effect on its probability of winning a game?

Scenarios are broader questions to help you develop an end-to-end project for your portfolio:

You have just been hired as a data analyst for a local soccer team. The team has recently signed on some junior players and wants to give them as much experience as possible without losing games. If the head coach could be confident in the outcome of a game by halftime, they would be more likely to give the junior players time on the field.

The coach has asked you whether you can predict the outcome of a game by the results at halftime and how confident you would be in the prediction.

You will need to prepare a report that is accessible to a broad audience. It should outline your motivation, steps, findings, and conclusions.

  • AI Chat
  • Code