Workspace
Giulia Brambilla/

Safety first: An analysis of UK major accidents in 2020 to reduce road fatalities

0
Beta
Spinner

Safety first: An analysis of UK major accidents in 2020 to reduce road fatalities

📖 BACKGROUND

We work for the road safety team within the department of transport, and the department is looking into how they can reduce the number of serious accidents.

It is important to notice that the safety team classes serious accidents as fatal accidents involving 3+ casualties.

The department is trying to learn more about the characteristics of these accidents, so they can brainstorm interventions that could lower the number of deaths.

They have asked for our assistance with answering a number of questions.

💾 THE DATA

We have two sources of information available:

  • A dataset containing data on every accident that is reported.

This dataset has been published by the UK department for transport and it is available here.

  • A lookup file for 2020's accidents.

This file contains a description of each of accidents' dataset columns and will be useful to correctly interpret the accidents' data.

📌 PROBLEM STATEMENT

Our goal for this project is to create a report that answers the following questions:

  1. What time of day and day of the week do most serious accidents happen?
  2. Are there any patterns in the time of day / day of the week when serious accidents occur?
  3. What characteristics stand out in serious accidents compared with other accidents?
  4. On what areas would you recommend the planning team focus their brainstorming efforts to reduce serious accidents?

Through data cleaning, analysis and visualization we will answer these questions in order to help our stakeholder to increase road safety by understanding when and how fatal accidents tend to happen and, consequently, take actions to prevent them and save lives.

📚 LOAD PACKAGES

Let's start by loading all the necessary Python packages.

🗓 LOAD DATAFRAMES

We will then load the available datasets. This is how the first dataframe, about the accidents occurred, looks like.

While this is the lookup file with the accidents' fields descriptions.

⌛️ EXPLORATORY DATA ANALYSIS

To identify effective strategies for increasing road safety and reducing the number of serious accidents, it is important to understand the circumstances. Still, before diving deeper into the analysis, the our available data has to be examined and, if necessary, properly cleaned.

We have a dataframe of 27 columns and more than 91K rows, and almost none of the values is null, except for a few data points in the longitude and latitude columns. Now let's understand if we have duplicates among the accidents. To do this we will use the accidents' unique reference number.

There are no duplicates. For this reason, we cay say that there were a total of 91199 recorded road accidents in the UK, but now the question is: what period does our dataset refer to? Let's quickly explore the accident_year column.

This means that all accidents in our dataset happened during the year 2020.

Now, we know that the transport department wants to focus on serious accidents. And we also know that a serious accident is considered a fatal accident involving 3+ casualties. Therefore, for the extent of this analysis, we will focus our work on this specific subset.

❓ Question 1: When do most serious accidents happen?

The first question is: What time of day and day of the week do most serious accidents happen? Let's investigate and answer this question using the data we have available. In this part we will will explore when fatal accidents are more likely to occur in the UK in 2020.

Let's start by analysing the days of the week. As we can see in the lookup, the days are represented by numbers in our accident dataframe, starting from 1 (Sunday) until 7 (Saturday). We will convert them into strings for more clarity and we will create a bar chart to show the number of accidents that happened on each day of the week.

It seems that serious accidents most likely happen on Saturday, followed by Friday and Sunday. This means that during the weekend the chances are higher for a fatal accident with at least 3 casualties to occur. Let's look at the numbers and do a little sanity check to confirm this assumption.

Indeed, we can confirm that the top day of the week with most accidents recorded is Saturday, with 17% of serious accidents happening on this day of the week.

We can also say that during 2020 most serious accidents happened during the weekend, being the following the top days:

  1. Saturday: 820 fatal accidents (17%)
  2. Friday: 798 fatal accidents (16,6%)
  3. Sunday: 687 fatal accidents (14,3%)

Which, in total, account for the 47,8% of serious accidents.

The safest day to be on the road is Monday, instead.

Now we will look into the time of the day that records the highest number of serious accidents.

As we have seen above when glimpsing at the dataframe, the time column is an object datatype, and we should convert it into a more suitable type like a datetime in order to easily manipulate it.