Covid Cases by US State
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    https://app.datacamp.com/workspace/w/82676858-ce8a-4377-ac43-b8e776709145/chat#cf0bca59-4c94-437a-847e-b4b25602b201

    Covid Cases by US State

    One of the best ways to improve your data visualization skills is to try and replicate great visualizations you see out there. Using the dataset provided in this template, you can take a look at how to recreate some amazing visualizations using Python so that you can take your data visualization skills to the next level.

    You can try and replicate following visualization from the New York Times that was published on March 21st 2020 to visualize the spread of COVID by state. Read the original article to get a better understanding.

    You will need two datasets to replicate this plot. The first dataset is provided by the New York Times and provides a time series of COVID cases by date. The second dataset provides a useful mapping of states to x-y coordinates on the grid. Use it wisely to place the different panels appropriately.

    # Load packages
    import pandas as pd
    import matplotlib.pyplot as plt
    %config InlineBackend.figure_format = 'retina'
    import seaborn as sns

    Load your data

    # Covid Cases by State
    covid_cases = pd.read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv")
    covid_cases.head(100)
    # Grid Coordinates for States
    # Source: https://github.com/hrbrmstr/statebins/blob/master/R/aaa.R
    state_coords = pd.read_csv('state_coords.csv')
    state_coords.head(100)
    # Merge covid_cases and state_coords datasets
    merged_data = pd.merge(covid_cases, state_coords, how='inner', on='state')
    merged_data.head()
    sns.set()
    # Creating a histogram to display the distribution of cases in the merged_data dataframe
    sns.histplot(data=merged_data, x="cases", kde=True)
    plt.title("Distribution of COVID-19 Cases")
    plt.xlabel("Cases")
    plt.ylabel("Frequency")
    plt.show()

    Calculate cases in date 2020-01-21 and 2020-01-31

    # Convert 'date' column to datetime format
    merged_data['date'] = pd.to_datetime(merged_data['date'])
    
    # Filter data for January 2020
    jan_2020_data = merged_data[(merged_data['date'] >= '2020-01-21') & (merged_data['date'] <= '2020-01-31')]
    
    # Calculate total cases and deaths in January 2020
    total_cases_jan_2020 = jan_2020_data['cases'].sum()
    total_deaths_jan_2020 = jan_2020_data['deaths'].sum()
    
    total_cases_jan_2020, total_deaths_jan_2020
    Number of cases in New York City per date
    # Filter data for New York City
    nyc_data = merged_data[merged_data['state'] == 'New York']
    
    # Group data by date and calculate total cases for each date
    cases_per_date_nyc = nyc_data.groupby('date')['cases'].sum().reset_index()
    
    cases_per_date_nyc
    import seaborn as sns
    import matplotlib.pyplot as plt
    
    # Creating a line plot for NYC COVID-19 cases over time
    sns.lineplot(data=cases_per_date_nyc, x="date", y="cases")
    plt.title("COVID-19 Cases Over Time in NYC")
    plt.xlabel("Date")
    plt.ylabel("Total Cases")
    plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
    plt.show()
    Average of cases in each year
    # Extract year from the 'date' column
    merged_data['year'] = merged_data['date'].dt.year
    
    # Group by year and calculate the average number of cases
    average_cases_per_year = merged_data.groupby('year')['cases'].mean().reset_index()
    
    average_cases_per_year