Beta
import pandas as pd
import numpy as np
df = pd.read_csv('country_vaccination_stats.csv')
df.head()
df.info()
for column in df:
print('Column: {} - Unique Values: {}'.format(column, df[column].unique()))
Question4
Code Implementation Task: Implement code to fill the missing data (impute) in daily_vaccinations column per country with the minimum daily vaccination number of relevant countries.
Note: If a country does not have any valid vaccination number yet, fill it with “0” (zero).
Please provide the link to your code as answer to this question.
min_vac = df.groupby('country')['daily_vaccinations'].min()
min_vac.head()
df['daily_vaccinations'] = df.groupby('country')['daily_vaccinations'].apply(lambda x: x.fillna(x.min()))
df.head()
df.isnull().sum()
df['daily_vaccinations'] = df['daily_vaccinations'].fillna(0)
df.isnull().sum()
Question6¶
Code Implementation Task: Implement code to list the top-3 countries with highest median daily vaccination numbers by considering missing values imputed version of dataset. Please provide the link to your code as answer to this question.
median_vac = df.groupby('country')['daily_vaccinations'].median()
median_vac.head()