Hunting the Saturday Night Killer
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Hunting the Saturday Night Killer

    Summary

    The road safety team within the department of transport is looking to understand and minimize the number of major incidents, which are defined as fatal accidents involving 3+ casualties.

    Like most serial killers, major incidents leave behind not only a trail of destruction, but also data that could reveal their patterns of attack and ultimately lead to their neutralization.

    Using accident data from 2020, the profile resulting from the analysis in this report shows that:

    • Major incidents are 3 times deadlier than all other accidents
    • Major incidents frequent single carriageways in rural areas
    • Major incidents peak in late summer, weekends, and late afternoons
    • Major incidents strike in broad daylight, fine weather, and at 60-mph limits
    • Saturday at 10pm is the deadliest time by sheer number of accidents and casualties
    • Major incidents occurring at this time are more influenced by external conditions
    • Location and time are the most important predictors of major incidents

    The seasonality of major incidents seems to indicate that they could be influenced by both behavioural factors not recorded in the data (e.g., driver impairment) and by external factors recorded in the data (e.g., lighting, weather conditions).

    Hence, a two-pronged strategy aiming to address both behavioural and external factors involved in major incidents is recommended. Said strategy should consider the following actions:

    • Extend the analysis of accident data to previous years
    • Make the analysis of accident data an ongoing task
    • Include behavioural factors like driver impairment, driver experience, car conditions, etc., in the accident data
    • Counteract behavioural factors by increasing awareness of drivers about major incidents
    • Minimize potentially hazardous external conditions at specific known locations and times of the deadliest major incidents

    Context

    The road safety team within the department of transport are looking into how they can reduce the number of major incidents. The safety team classes major incidents as fatal accidents involving 3+ casualties. They are trying to learn more about the characteristics of these major incidents so they can brainstorm interventions that could lower the number of deaths.

    Said task could be likened to the hunt for a serial killer who has been striking from anonymity thus far. Like most serial killers, major incidents leave behind not only a trail of destruction, but also data that could reveal their patterns of attack and ultimately lead to their neutralization.

    Objectives

    This report aims to help bring major incidents out from their hiding place amidst the accident data of 2020, so that they can be understood and hopefully neutralized, by achieving four objectives:

    1. Build a distinct profile of major incidents in general
    2. Determine the specific day of the week and time of day where most major incidents happen, and uncover any distinctive patterns at this time
    3. Establish the most relevant features that can be used to predict major incidents
    4. Present recommendations that could help the planning team reduce major incidents

    The data

    The reporting department have been collecting data on every accident that is reported. They've included this along with a lookup file for 2020's accidents.

    Published by the department for transport. https://data.gov.uk/dataset/road-accidents-safety-data Contains public sector information licensed under the Open Government Licence v3.0.

    # Import libraries to be used
    import warnings
    warnings.filterwarnings('ignore')
    
    import sys
    !{sys.executable} -m pip install geopandas
    Hidden output
    # Import packages
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt; plt.style.use('seaborn-whitegrid')
    import seaborn as sns
    from datetime import date
    import descartes
    import geopandas as gpd
    from shapely.geometry import Point, Polygon
    from sklearn.tree import DecisionTreeClassifier
    from matplotlib.gridspec import GridSpec
    # Import data
    accidents = pd.read_csv(r'./data/accident-data.csv')
    accidents.head()
    # Import lookup table
    lookup = pd.read_csv(r'./data/road-safety-lookups.csv')
    lookup.head()

    Forensic tools

    Our killer is hiding somewhere in the raw data, so we will take the following two steps:

    • Prepare the data for analysis
    • Develop some visualization tools to help identify patterns in major incidents

    Note: this section presents data transformations and python code used to process the data. Readers not interested in these technical details can jump ahead to the next section of this report, titled Profiling the killer

    Preparing the data

    The data is first imported and checked for any missing entries. A few records seem to contain missing latitude and longitude entries. Since the number of incomplete records is small (14), and they do not belong to the class of interest (i.e., major incident), they can be safely dropped from the data, leaving a total of 91185 accidents to analyse. After ruling out the existence of duplicates records, the data is then passed through the following transformations:

    • Date and time fields are converted to timestamps
    • Month and Hour fields are extracted from said timestamps
    • Major incidents are labelled 1, non-major 0
    • Categorical features are identified, and their data type changed accordingly
    • Lists of numerical and categorical predictors are prepared
    # Check for missing values and data types
    accidents.info()
    # Check records with null entries
    accidents[accidents.isnull().any(axis=1)]
    # Drop records with null entries
    original_rows = accidents.shape[0]
    accidents.dropna(inplace=True)
    print('Dropped {} records with null entries'.format(original_rows - accidents.shape[0]))
    # Check for duplicates
    accidents.duplicated().any()