Exploring World Cup Data in Python (Solution)
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Exploring World Cup Data in Python

    This dataset (source) includes 44,066 results of international football matches starting from the very first official match in 1872 up to 2022. The matches range from FIFA World Cup to FIFI Wild Cup to regular friendly matches. The matches are strictly men's full internationals and the data does not include Olympic Games or matches where at least one of the teams was the nation's B-team, U-23 or a league select team.

    Task 1: Import and prepare the dataset

    • Import the pandas package with the usual alias.
    # Import the pandas package with the usual alias
    import pandas as pd
    • Read "results.csv". Assign to results.
    • Convert the date column to a datetime.
    • Get the year component of the date column; store in a new column named year.
    [2]
    # Read results.csv. Assign to results.
    results = pd.read_csv('results.csv')
    
    # Convert the date column to a datetime
    results['date'] = pd.to_datetime(results['date'])
    
    # Get the year component of date column; store in a new column named year 
    results['year'] = results['date'].dt.year
    
    # See the result
    results

    Task 2: Get the FIFA World Cup data

    • Using results, count the number of rows of each tournament value.
    • Convert the results to a DataFrame for nicer printing.
    [3]
    # Count the number of rows for each tournament; convert to DataFrame
    results \
    	.value_counts("tournament") \
    	.to_frame("num_matches")
    • Query for the rows where tournament is equal to "FIFA World Cup"
    [4]
    # Query for the rows where tournament is equal to "FIFA World Cup"
    world_cup_res = results \
    	.query('tournament == "FIFA World Cup"')
    
    # See the results
    world_cup_res

    Task 3: Your turn: How many matches in every world cup?

    • Using world_cup_res, count the number of rows of each year value.
    • Convert the results to a DataFrame for nicer printing.
    [5]
    # Count the number of rows for each year; convert to DataFrame
    matches_per_year = world_cup_res \
    	.value_counts("year") \
    	.to_frame("num_matches")
    
    # See the results
    matches_per_year
    • Import the plotly.express package using the alias px.