Workspace
Giulia Brambilla/

Safety first: An analysis of UK major accidents in 2020 to reduce road fatalities

0
Beta
Spinner

Safety first: An analysis of UK major accidents in 2020 to reduce road fatalities

📖 BACKGROUND

We work for the road safety team within the department of transport, and the department is looking into how they can reduce the number of serious accidents.

It is important to notice that the safety team classes serious accidents as fatal accidents involving 3+ casualties.

The department is trying to learn more about the characteristics of these accidents, so they can brainstorm interventions that could lower the number of deaths.

They have asked for our assistance with answering a number of questions.

💾 THE DATA

We have two sources of information available:

  • A dataset containing data on every accident that is reported. This dataset has been published by the UK department for transport and it is available here.
  • A lookup file for 2020's accidents. This file contains a description of each of accidents' dataset columns and will be useful to correctly interpret the accidents' data.

📌 PROBLEM STATEMENT

Our goal for this project is to create a report that answers the following questions:

  1. What time of day and day of the week do most serious accidents happen?
  2. Are there any patterns in the time of day / day of the week when serious accidents occur?
  3. What characteristics stand out in serious accidents compared with other accidents?
  4. On what areas would you recommend the planning team focus their brainstorming efforts to reduce serious accidents?

Through data cleaning, analysis and visualization we will answer these questions in order to help our stakeholder to increase road safety by understanding when and how fatal accidents tend to happen and, consequently, take actions to prevent them and save lives.

📚 LOAD PACKAGES

Let's start by loading all the necessary Python packages.

# Import necessary libraries
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import plotly.express as px
import seaborn as sns 
from matplotlib import rcParams
from datetime import time, timedelta, datetime
from time import mktime
import plotly.graph_objects as go

🗓 LOAD DATAFRAMES

We will then load the available datasets. This is how the first dataframe, about the accidents occurred, looks like.

# Accidents dataset
accidents = pd.read_csv(r'./data/accident-data.csv')
accidents.head()

While this is the lookup file with the accidents' fields descriptions.

# Lookup dataset
lookup = pd.read_csv(r'./data/road-safety-lookups.csv')
lookup.head()

⌛️ EXPLORATORY DATA ANALYSIS

To identify effective strategies for increasing road safety and reducing the number of serious accidents, it is important to understand the circumstances. Still, before diving deeper into the analysis, the our available data has to be examined and, if necessary, properly cleaned.

# Quick glimpse at the data
accidents.info()

We have a dataframe of 27 columns and more than 91K rows, and almost none of the values is null, except for a few data points in the longitude and latitude columns. Now let's understand if we have duplicates among the accidents. To do this we will use the accidents' unique reference number.

# Check for duplicates
dups = accidents[accidents.duplicated(['accident_index'])]
print(len(dups))

There are no duplicates. For this reason, we cay say that there were a total of 91199 recorded road accidents in the UK, but now the question is: what period does our dataset refer to? Let's quickly explore the accident_year column.

# Look into accidents' year
accidents.accident_year.value_counts()

This means that all accidents in our dataset happened during the year 2020.

Now, we know that the transport department wants to focus on serious accidents. And we also know that a serious accident is considered a fatal accident involving 3+ casualties. Therefore, for the extent of this analysis, we will focus our work on this specific subset.

# Select only accidents with 3 or more casualties involved
serious = accidents.loc[accidents['number_of_casualties'] >= 3]
serious.head()

# How many serious accidents in 2020?
print(f'In 2020, there were a total of {serious.shape[0]} serious accidents in the UK.')



  • AI Chat
  • Code