Soccer Through the Ages
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Soccer Through the Ages

    This dataset contains information on international soccer games throughout the years. It includes results of soccer games and information about the players who scored the goals. The dataset contains data from 1872 up to 2023.

    💾 The data

    • data/results.csv - CSV with results of soccer games between 1872 and 2023
      • home_score - The score of the home team, excluding penalty shootouts
      • away_score - The score of the away team, excluding penalty shootouts
      • tournament - The name of the tournament
      • city - The name of the city where the game was played
      • country - The name of the country where the game was played
      • neutral - Whether the game was played at a neutral venue or not
    • data/shootouts.csv - CSV with results of penalty shootouts in the soccer games
      • winner - The team that won the penalty shootout
    • data/goalscorers.csv - CSV with information on goal scorers of some of the soccer games in the results CSV
      • team - The team that scored the goal
      • scorer - The player who scored the goal
      • minute - The minute in the game when the goal was scored
      • own_goal - Whether it was an own goal or not
      • penalty - Whether the goal was scored as a penalty or not

    The following columns can be found in all datasets:

    • date - The date of the soccer game
    • home_team - The team that played at home
    • away_team - The team that played away

    These shared columns fully identify the game that was played and can be used to join data between the different CSV files.

    Source: GitHub

    📊 Some guiding questions and visualization to help you explore this data:

    1. Which are the 15 countries that have won the most games since 1960? Show them in a horizontal bar plot.

    2. How many goals are scored in total per minute of the game? Show this in abar chart with minutes on the x- axeis .if you are up for a challenge you can even create an animated chart showing how it has changed over the year.

    3. Who are the ten plyers who scored the most Hat-tricks.

    4. What is the proportion of games won by each team at home and away? What is the difference between the proportions?.

    5. How many games have been won by the home team? And by the away team?

    💼 Develop a case study for your portfolio

    After exploring the data, you can create a comprehensive case study using this dataset. We have provided an example objective below, but feel free to come up with your own - the world is your oyster!

    Example objective: The UEFA Euro 2024 tournament is approaching. Utilize the historical data to construct a predictive model that forecasts potential outcomes of the tournament based on the team draws. Since the draws are not known yet, you should be able to configure them as variables in your notebook.

    You can query the pre-loaded CSV files using SQL directly. Here’s a sample query:

    Unknown integration
    DataFrameavailable as
    results
    variable
    SELECT
    	*
    FROM 'data/results.csv'
    LIMIT 10
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    You can also use SQL cells to join the tables:

    Unknown integration
    DataFrameavailable as
    goalscorers_joined
    variable
    SELECT
    	*
    FROM 'data/goalscorers.csv'
    INNER JOIN 'data/results.csv' USING (date, home_team, away_team)
    LEFT JOIN 'data/shootouts.csv' USING (date, home_team, away_team)
    LIMIT 10
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    Alternatively, you can import the data using pandas, for example:

    Here we fouce in plotly express.

    Why we use plotly.express for these reson:

    1. Plotly express is high level data visualization library in python.
    2. Ease of use becuase it's designes to be user friendly and even to bick up epesial for those who are new in data visualization.
    3. Interactive Visualization this mean you can create plot with features like tooltip, zooming panning and more without writing extensive code for interactivity.
    4. plotly express seamless integtates with pandas DATAFRAME. Which are commonly used for data manipulation.
    5. Is still allows for customization you can tweak colors,labels titles and other attributes.
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    from plotly import express as px
    from datetime import datetime
    
    results = pd.read_csv("data/results.csv")
    results

    Steps of exploring our data:

    1. Data Collection here from CSV files
    2. Data Cleaning to remove any missing values or outliers in the datasets to ensure data quality
    3. Data Visualization here we use data visualization that matches data reguirements.
    4. Descriptive Statistics like mean median standard devation so on
    5. Correlation anaysis and distribution analisis.
    # What is numbers of columns and rows
    results.shape
    # Get all information about the data
    results.info()
    # Data types
    results.dtypes
    ‌
    ‌
    ‌