Cleaning Data in Python

Beta

Cleaning Data in Python

👋 Welcome to your workspace! Here, you can write and run Python code and add text in Markdown. Below, we've imported the datasets from the course Cleaning Data in Python as DataFrames as well as the packages used in the course. This is your sandbox environment: analyze the course datasets further, take notes, or experiment with code!

%%capture
# Install fuzzywuzzy
!pip install fuzzywuzzy

# Importing course packages; you can add more too!
import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
import missingno as msno
import fuzzywuzzy
import recordlinkage 

# Importing course datasets as DataFrames
ride_sharing = pd.read_csv('datasets/ride_sharing_new.csv', index_col = 'Unnamed: 0')
airlines = pd.read_csv('datasets/airlines_final.csv',  index_col = 'Unnamed: 0')
banking = pd.read_csv('datasets/banking_dirty.csv', index_col = 'Unnamed: 0')
restaurants = pd.read_csv('datasets/restaurants_L2.csv', index_col = 'Unnamed: 0')
restaurants_new = pd.read_csv('datasets/restaurants_L2_dirty.csv', index_col = 'Unnamed: 0')

ride_sharing.head() # Display the first five rows of this DataFrame

restaurants_new

# Begin writing your own code here!

Don't know where to start?

Try completing these tasks:

For each DataFrame, inspect the data types of each column and, where needed, clean and convert columns into the correct data type. You should also rename any columns to have more descriptive titles.
Identify and remove all the duplicate rows in ride_sharing.
Inspect the unique values of all the columns in airlines and clean any inconsistencies.
For the airlines DataFrame, create a new column called International from dest_region, where values representing US regions map to False and all other regions map to True.
The banking DataFrame contains out of date ages. Update the Age column using today's date and the birth_date column.
Clean the restaurants_new DataFrame so that it better matches the categories in the city and type column of the restaurants DataFrame. Afterward, given typos in restaurant names, use record linkage to generate possible pairs of rows between restaurants and restaurants_new using criteria you think is best.

Cleaning Data in Python

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Cleaning Data in Python

Don't know where to start?

Cleaning Data in Python