Course notes: Working with Categorical Data in Python
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Course Notes

    Use this workspace to take notes, store code snippets, or build your own interactive cheatsheet! For courses that use data, the datasets will be available in the datasets folder.

    # Import any packages you want to use here
    import pandas as pd

    Take Notes

    Definition Categorical data in python:

    • Categorical data is type of data that consists of categories or groups this type of data is non-numeric.
    • Often represented by words or symbols.

    The type of categorical data:

    • Categorical data can divided into nominal and ordinal data.
    • Nominal data refers to data that cannot be ranked or ordered sush as eye color or gender.
    • Ordinal data is data that can be ranked or ordered such as academic grades or levels of satisfaction.
    • Nominal data often use in surveys or when collecting demographic information
    • Ordinal data used in rating scales or when measuuring attitudes or opinions

    Add notes here about the concepts you've learned and code cells with code you want to keep.

    Add your notes here

    # Read your data in csv file
    adult = pd.read_csv('datasets/adult.csv')
    adult
    
    # Get summarize of column Above/Below/50k
    print(adult["Above/Below 50k"].describe())

    observe here we get different result different values when we use normalize equal True.

    When put normalize to True the output included relative ferquency value instead of counts of unique values

    # print frequency table of Above/Below?50k
    print(adult["Above/Below 50k"].value_counts())
    # print relative frequency values here we put argument normalize equal True
    
    print(adult["Above/Below 50k"].value_counts(normalize=True))

    What the ddifferent between dtype and dtypes

    dtype use with Seris
    dtypes use with DataFrame this will be obvious when we applying.
    adult = pd.read_csv('datasets/adult.csv')
    adult.head(3)
    # Use dtypes
    adult.dtypes
    # Use dtype ('O') means object
    adult["Marital Status"].dtype
    # First we conver this column marital Status to categories data
    adult["Marital Status"] = adult["Marital Status"].astype("category")
    adult["Marital Status"].dtype

    How to create categorical Series?

    There is two ways:

    • Use pandas dot Series
    • or pandas dot categorical
    • With parameter Categorical we use key categories=[] list and key orderedequal True