EDA: CO2 Emissions and Bicycle Market Analysis
1️⃣ Python 🐍 - CO2 Emissions
📖 Background
You volunteer for a public policy advocacy organization in Canada, and your colleague asked you to help her draft recommendations for guidelines on CO2 emissions rules.
After researching emissions data for a wide range of Canadian vehicles, she would like you to investigate which vehicles produce lower emissions.
💾 The data I
You have access to seven years of CO2 emissions data for Canadian vehicles (source):
- "Make" - The company that manufactures the vehicle.
- "Model" - The vehicle's model.
- "Vehicle Class" - Vehicle class by utility, capacity, and weight.
- "Engine Size(L)" - The engine's displacement in liters.
- "Cylinders" - The number of cylinders.
- "Transmission" - The transmission type: A = Automatic, AM = Automatic Manual, AS = Automatic with select shift, AV = Continuously variable, M = Manual, 3 - 10 = the number of gears.
- "Fuel Type" - The fuel type: X = Regular gasoline, Z = Premium gasoline, D = Diesel, E = Ethanol (E85), N = natural gas.
- "Fuel Consumption Comb (L/100 km)" - Combined city/highway (55%/45%) fuel consumption in liters per 100 km (L/100 km).
- "CO2 Emissions(g/km)" - The tailpipe carbon dioxide emissions in grams per kilometer for combined city and highway driving.
The data comes from the Government of Canada's open data website.
# Import the pandas and numpy packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.lines as mlines
# Load the data
cars = pd.read_csv('data/co2_emissions_canada.csv')
# create numpy arrays
cars_makes = cars['Make'].to_numpy()
cars_models = cars['Model'].to_numpy()
cars_classes = cars['Vehicle Class'].to_numpy()
cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
cars_cylinders = cars['Cylinders'].to_numpy()
cars_transmissions = cars['Transmission'].to_numpy()
cars_fuel_types = cars['Fuel Type'].to_numpy()
cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()
# Preview the dataframe
cars
# Look at the first ten items in the CO2 emissions array
cars_co2_emissions[:10]
💪 Challenge I
Help your colleague gain insights on the type of vehicles that have lower CO2 emissions. Include:
- What is the median engine size in liters?
- What is the average fuel consumption for regular gasoline (Fuel Type = X), premium gasoline (Z), ethanol (E), and diesel (D)?
- What is the correlation between fuel consumption and CO2 emissions?
- Which vehicle class has lower average CO2 emissions, 'SUV - SMALL' or 'MID-SIZE'?
- What are the average CO2 emissions for all vehicles? For vehicles with an engine size of 2.0 liters or smaller?
- Any other insights you found during your analysis?
- What is the median engine size in liters?
The median engine size in liters is 3.
# Checking the median engine size
median_engine = cars['Engine Size(L)'].median()
median_engine
- What is the average fuel consumption for regular gasoline (Fuel Type= X), premium gasoline (Z), ethanol (E), and diesel (D)?
The average fuel consumption, in liters per 100 km, for each of these fuel types is:
- Diesel (D) = 8.84 L/100 km
- Regular gasoline (X) = 10.08 L/100 km
- Premium gasoline (Z) = 11.42 L/100 km
- Ethanol (E) = 16.86 L/100 km
# Define a dictionary to map the fuel type codes to their names
fuel_type_names = {'X': 'Regular Gasoline', 'Z': 'Premium Gasoline', 'E': 'Ethanol', 'D': 'Diesel', 'N': 'Natural Gas'}
# Create a color palette
palette = {'X': '#77AC30', 'Z': '#D9AF3B', 'E': '#BA3A0A', 'D': '#A6A6A6'}
# Checking average fuel consumption by type, excluding natural gas (N).
avg_fuel_consumption = cars.groupby('Fuel Type')['Fuel Consumption Comb (L/100 km)'].mean().drop('N').sort_values()
# Plotting the average fuel consumption
sns.set(style="whitegrid")
ax = sns.barplot(x=avg_fuel_consumption.index, y=avg_fuel_consumption, palette=palette)
# Add data labels to the bars with the updated fuel type names
for i, v in enumerate(avg_fuel_consumption):
fuel_type = fuel_type_names.get(avg_fuel_consumption.index[i], 'Unknown')
ax.text(i, v, "{:.2f}".format(v), ha='center', fontweight='light')
# Title and labels
plt.xlabel('Fuel Type')
plt.ylabel('Fuel Consumption')
plt.title('Average fuel consumption by type')
plt.xticks(range(len(avg_fuel_consumption.index)), [fuel_type_names.get(fuel_type, 'Unknown') for fuel_type in avg_fuel_consumption.index], rotation=45)
plt.tight_layout()
plt.show()
- What is the correlation between fuel consumption and CO2 emissions?
The correlation between fuel consumption and CO2 emissions is strong and positive, with a correlation coefficient of approximately 0.918. This indicates that as the fuel consumption increases, so does the CO2 emissions. The coefficient is close to +1, suggesting a strong linear relationship between the variables.
# Checking correlation between the two variables
corr = cars['Fuel Consumption Comb (L/100 km)'].corr(cars['CO2 Emissions(g/km)'])
corr