Skip to content
Summary Notes: Data Manipulation with pandas
1. Inspecting a DataFrame:
- .head(): Returns the first few rows of the DataFrame.
- .info(): Provides information on columns, data types, and missing values.
- .shape: Returns the number of rows and columns.
- .describe(): Calculates summary statistics for each column.
- Example:
homelessness.head()
,homelessness.info()
,homelessness.shape
,homelessness.describe()
2. Parts of a DataFrame:
- .values: A two-dimensional NumPy array of values.
- .columns: An index of column names.
- .index: An index for rows (row numbers or names).
- Example:
homelessness.values
,homelessness.columns
,homelessness.index
3. Sorting Rows:
- Sorting by one column:
df.sort_values("column_name")
- Sorting by multiple columns:
df.sort_values(["col_name1", "col_name2"])
- Example:
homelessness.sort_values("num_homeless")
,homelessness.sort_values(["region", "num_family_members"], ascending=[True, False])
4. Subsetting Columns:
- Selecting a single column:
df["column_name"]
- Selecting multiple columns:
df[["col_name1", "col_name2"]]
- Example:
individuals = homelessness["individuals"]
,state_fam = homelessness[["state", "family_members"]]
,ind_state = homelessness[["individuals", "state"]]
# Start coding here...