Everyone Can Learn Python and SQL! Analysis of Canadian cars CO2 emissions and bike sales!
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Everyone Can Learn Python Scholarship

    1️⃣ Python 🐍 - CO2 Emissions

    Now let's now move on to the competition and challenge.

    📖 Background

    You volunteer for a public policy advocacy organization in Canada, and your colleague asked you to help her draft recommendations for guidelines on CO2 emissions rules.

    After researching emissions data for a wide range of Canadian vehicles, she would like you to investigate which vehicles produce lower emissions.

    CO2 emissions importance

    Knowing the CO2 car emissions is essential for a country to implement policies that can mitigate climate change. The transportation sector is a significant source of greenhouse gas emissions globally, contributing to around 24% of total CO2 emissions. Thus, it becomes imperative for a country to keep track of the carbon emissions of the vehicles on its roads. It enables the country to develop effective policies and regulations to reduce emissions, promote cleaner and more efficient vehicles, and incentivize the use of alternative modes of transportation, such as public transport and cycling.

    Moreover, knowing the CO2 car emissions can help countries to monitor their progress towards meeting international targets and commitments, such as the Paris Agreement's goal of limiting global warming to well below 2°C above pre-industrial levels. It also helps countries to identify the sectors where the most significant carbon reductions can be achieved and prioritize their efforts accordingly. This information is critical for policymakers to evaluate the effectiveness of their policies, make necessary adjustments, and ensure that the country is on track to meet its climate goals.

    Finally, tracking CO2 car emissions can also have economic benefits. It can promote the development of cleaner technologies and stimulate innovation in the transportation sector, leading to new jobs and economic growth. Furthermore, reducing carbon emissions can help to decrease the country's dependence on imported oil, increase energy security, and improve air quality, leading to a healthier population and a more sustainable future. In conclusion, knowing the CO2 car emissions is essential for a country to develop effective policies, monitor its progress towards meeting international targets, and realize the economic and environmental benefits of reducing carbon emissions. Therefore, this analysis will aim to determine the best factors for identifying cars that produce lower CO2 emissions.

    💾 The data I

    You have access to seven years of CO2 emissions data for Canadian vehicles (source):

    • "Make" - The company that manufactures the vehicle.
    • "Model" - The vehicle's model.
    • "Vehicle Class" - Vehicle class by utility, capacity, and weight.
    • "Engine Size(L)" - The engine's displacement in liters.
    • "Cylinders" - The number of cylinders.
    • "Transmission" - The transmission type: A = Automatic, AM = Automatic Manual, AS = Automatic with select shift, AV = Continuously variable, M = Manual, 3 - 10 = the number of gears.
    • "Fuel Type" - The fuel type: X = Regular gasoline, Z = Premium gasoline, D = Diesel, E = Ethanol (E85), N = natural gas.
    • "Fuel Consumption Comb (L/100 km)" - Combined city/highway (55%/45%) fuel consumption in liters per 100 km (L/100 km).
    • "CO2 Emissions(g/km)" - The tailpipe carbon dioxide emissions in grams per kilometer for combined city and highway driving.

    The data comes from the Government of Canada's open data website.

    # Import the pandas and numpy packages
    !pip install scikit_posthocs
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    import pingouin as pg
    import scikit_posthocs as sp
    
    # Custom graphic and theme
    custom_params = {"axes.spines.right": False,
                     "axes.spines.top": False,
                     'grid.color': '#dedede'}
    sns.set_theme(context='paper',style='ticks',rc=custom_params,
                  palette='colorblind',font_scale=1.2)
    
    # Constant variables in case needed
    img_dir = 'img_created/'
    img_save_params = dict(dpi=300, bbox_inches='tight', pad_inches=0.3)
    model_dir ='models_created'
    seed=23
    
    #@title Funtions
    # Functions used
    def explore(df):
        ret = pd.DataFrame(columns=['Type','Min','Max',"Nan %",
                                    '# Values','Unique values'])
        for col , content in df.items():
            values= []
            values.append(content.dtype) #look for actual type
            try: 
                values.append(content.min()) #Min
                values.append(content.max()) 
            except Exception:
                values.append('None')
                values.append('None')
            values.append(df[col].isnull().sum()/len(df[col]) *100) #% of Nan's
            values.append(content.nunique()) #count # of uniques
            values.append(content.unique())  #display unique values
            
            ret.loc[col]=values
        ret.index.names= ['Variables']
        print('Total Rows:',df.shape[0])
        print('Total Columns:',df.shape[1])
        return ret
    
    def set_fig_caption(fig,fig_number,x=0,y=-0.01,gap=0.05,title=None,caption:str=None,size=8):
        if not title:
            title=''
        fig.text(x,y,f'Figure {fig_number}. {title.title()}',
                 weight='bold',size=size+1)
        if caption: 
            for i in np.arange(caption.count('\n')): y -= 0.02 
            fig.text(x,y-gap,caption,color='#474949',
                     ma='left',wrap=True,size=size)
        return fig 
    
    Hidden output
    # Load the data
    cars = pd.read_csv('data/co2_emissions_canada.csv')
    
    # Creating a copy just in case we need original information
    df = cars.copy()
    cars.head()
    
    # Changing variable names for easier typing
    column_names = cars.columns.values.tolist() + ['Fuel Economy (km/L)','CO2 Emissions per Unit of Swept Volume']
    working_column_names = ['make','model','vclass','engine_s','cylinders',
                            'transmission','fuel_type','fuel_consumption_comb','co2_emi']
    cars.columns = working_column_names

    Creating new variables

    • Fuel economy ('fuel_economy'): The distance traveled by a vehicle per unit of fuel consumed, typically measured in kilometers per liter (km/L) or miles per gallon (mpg). It's a measure of how efficiently a vehicle uses fuel to travel a certain distance. Fuel Economy (km/L) = (Fuel Consumption Comb (L/100km)) ^ (-1) * 100

    • CO2 Emissions per Unit of Swept Volume (engine capacity) ('co2_usv'): This variable could provide information about the combustion efficiency of the engine. If a car has a high value in this variable, it means that it is emitting more CO2 relative to its engine capacity, indicating less efficient combustion and thus higher pollution. CO2 Emissions per Unit of Engine Capacity = CO2 Emissions(g/km) / (Engine Size(L)' * 'Cylinders)

    Note: fuel consumption tells you how much fuel a vehicle uses, fuel economy tells you how far that vehicle can go on a given amount of fuel. Fuel economy is therefore a better measure of a vehicle's overall fuel efficiency, since it takes into account both the distance traveled and the amount of fuel used to cover that distance.

    Hidden code

    Data cleaning

    We will use a function that we created called explore to observe different metrics of our data, such as the total number of variables, data types, minimums, maximums, percentage of missing values, number, and examples of unique values for each column. This information is useful for taking data cleaning actions

    Hidden code

    As we can see, there are no missing data in our set and we have various types of variables. Therefore, they will be segmented to conduct an appropriate exploratory analysis for each type.

    Hidden code
    Hidden code

    Categorical Summary

    We can find that the top 5 companies with the most vehicles are Ford, Chevrolet, BMW, Mercedes-Benz, and Porsche.

    • The top 5 vehicle categories by frequency are SUV-SMALL, followed by MID-SIZE, COMPACT, SUV-STANDARD, and FULL-SIZE.
    • The top 5 most frequent transmissions are AS6, AS8, M6, A6, and A8.
    • The most frequent fuels are Regular and Premium (together they make up more than 80%), followed by Ethanol and Diesel. There is only 1 value for Natural Gas.
    • The top 5 cars with the most different models are the F-150 FFV 4X4, followed by F-150 FFV, Mustang, Focus FFV, and Sonic.

    Next we describe the numerical variables.