Nadia Rizky Hairunnisa/

Project: Analyzing Crime in Los Angeles


Los Angeles, California ๐Ÿ˜Ž. The City of Angels. Tinseltown. The Entertainment Capital of the World!

Known for its warm weather, palm trees, sprawling coastline, and Hollywood, along with producing some of the most iconic films and songs. However, as with any highly populated city, it isn't always glamorous and there can be a large volume of crime. That's where you can help!

You have been asked to support the Los Angeles Police Department (LAPD) by analyzing crime data to identify patterns in criminal behavior. They plan to use your insights to allocate resources effectively to tackle various crimes in different areas.

The Data

They have provided you with a single dataset to use. A summary and preview are provided below.

It is a modified version of the original data, which is publicly available from Los Angeles Open Data.


'DR_NO'Division of Records Number: Official file number made up of a 2-digit year, area ID, and 5 digits.
'Date Rptd'Date reported - MM/DD/YYYY.
'DATE OCC'Date of occurrence - MM/DD/YYYY.
'TIME OCC'In 24-hour military time.
'AREA NAME'The 21 Geographic Areas or Patrol Divisions are also given a name designation that references a landmark or the surrounding community that it is responsible for. For example, the 77th Street Division is located at the intersection of South Broadway and 77th Street, serving neighborhoods in South Los Angeles.
'Crm Cd Desc'Indicates the crime committed.
'Vict Age'Victim's age in years.
'Vict Sex'Victim's sex: F: Female, M: Male, X: Unknown.
'Vict Descent'Victim's descent:
  • A - Other Asian
  • B - Black
  • C - Chinese
  • D - Cambodian
  • F - Filipino
  • G - Guamanian
  • H - Hispanic/Latin/Mexican
  • I - American Indian/Alaskan Native
  • J - Japanese
  • K - Korean
  • L - Laotian
  • O - Other
  • P - Pacific Islander
  • S - Samoan
  • U - Hawaiian
  • V - Vietnamese
  • W - White
  • X - Unknown
  • Z - Asian Indian
'Weapon Desc'Description of the weapon used (if applicable).
'Status Desc'Crime status.
'LOCATION'Street address of the crime.
#ย Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

crimes = pd.read_csv("crimes.csv", parse_dates=["Date Rptd", "DATE OCC"], dtype={"TIME OCC": str})

Which hour has the highest frequency of crimes? Store as an integer variable called peak_crime_hour.

Based on the dataset head and dataset information above, we can observe that the TIME OCC column isn't a datetime object; rather, its values are stored in 'HHMM' string format. Let's begin by converting it to datetime format. Afterward, we can extract the hour to calculate the frequency of occurrences for each hour.

# Convert the 'TIME OCC' column to timedelta format
crimes['TIME OCC'] = pd.to_datetime(crimes['TIME OCC'], format='%H%M')
# Extract the hour and store it in 'HOUR OCC' column
crimes['HOUR OCC'] = crimes['TIME OCC'].dt.hour
# Calculate the frequency for each hour
crimes['HOUR OCC'].value_counts()
# Store the hour with the highest frequency
peak_crime_hour = crimes['HOUR OCC'].value_counts().index[0]
print(f"The peak crime hour happens at {peak_crime_hour}:00 (24-hour format)")

This is actually interesting information! The peak hour seems to occur right in the middle of the day, around lunchtime. To help our fellow detectives and police department, let's visualize the distribution of total crimes per hour ๐Ÿ•ต๏ธ

# Visualize total crimes per hour using countplot
sns.countplot(data=crimes, x='HOUR OCC')

plt.xlabel("Hour", fontsize=12)
plt.ylabel("Number of Crimes", fontsize=12)
plt.title("Total Crime Cases in Los Angeles per Hour", fontsize=15)

Which area has the largest frequency of night crimes (crimes committed between 10pm and 3:59am)? Save as a string variable called peak_night_crime_location.

To answer this question, we can filter our dataset using HOUR OCC column we've created earlier! Then use value_counts() function to get the location with the highest frequency of occurrences!

# Filter rows with hours between 10 PM and 3:59 AM
night_crimes = crimes[(crimes["HOUR OCC"] >= 22) | (crimes["HOUR OCC"] <= 4)]
# Calculate the frequency
night_crimes['AREA NAME'].value_counts()

So, it appears that the Central Area experiences the highest incidence of nighttime crime in Los Angeles! Let's make sure to store this information as instructed; it will be valuable for our fellow detectives and police members ๐Ÿ•ต๏ธ๐Ÿ‘ฎ

# Store the location with the highest frequency of night crimes
peak_night_crime_location = night_crimes['AREA NAME'].value_counts().index[0]
print("Area in Los Angeles with the largest frequency of night crimes is", peak_night_crime_location)

Identify the number of crimes committed against victims by age group (0-17, 18-25, 26-34, 35-44, 45-54, 55-64, 65+). Save as a pandas Series called victim_ages.

Let's begin by observing the frequency of occurrences for each age

# Calculate the frequency of 'Vict Age' column
crimes['Vict Age'].value_counts().sort_index()
  • AI Chat
  • Code