Cleaning Data in Python
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Cleaning Data in Python

    Run the hidden code cell below to import the data used in this course.

    # Import the course packages
    import pandas as pd
    import numpy as np
    import datetime as dt
    import matplotlib.pyplot as plt
    import missingno as msno
    import fuzzywuzzy
    import recordlinkage 
    
    # Import the course datasets
    ride_sharing = pd.read_csv('datasets/ride_sharing_new.csv', index_col = 'Unnamed: 0')
    airlines = pd.read_csv('datasets/airlines_final.csv',  index_col = 'Unnamed: 0')
    banking = pd.read_csv('datasets/banking_dirty.csv', index_col = 'Unnamed: 0')
    restaurants = pd.read_csv('datasets/restaurants_L2.csv', index_col = 'Unnamed: 0')
    restaurants_new = pd.read_csv('datasets/restaurants_L2_dirty.csv', index_col = 'Unnamed: 0')
    
    

    Explore Datasets

    Use the DataFrames imported in the first cell to explore the data and practice your skills!

    • For each DataFrame, inspect the data types of each column and, where needed, clean and convert columns into the correct data type. You should also rename any columns to have more descriptive titles.
    • Identify and remove all the duplicate rows in ride_sharing.
    • Inspect the unique values of all the columns in airlines and clean any inconsistencies.
    • For the airlines DataFrame, create a new column called International from dest_region, where values representing US regions map to False and all other regions map to True.
    • The banking DataFrame contains out of date ages. Update the Age column using today's date and the birth_date column.
    • Clean the restaurants_new DataFrame so that it better matches the categories in the city and type column of the restaurants DataFrame. Afterward, given typos in restaurant names, use record linkage to generate possible pairs of rows between restaurants and restaurants_new using criteria you think is best.

    #Ejemplos de limpieza de datos

    1. Ejemplo de cambiar tipo de datos
    2. Eliminar letras, simbolos o palabras de una columna
    3. Cambiar datos fuera de rango, todos los mayores de 10 a 5 por ejemplo
    4. Cambiar Fechas
    5. Ver duplicados
    6. Eliminar duplicados
    7. Visualisar datos unicos