Project: Analyzing River Thames Water Levels

Beta

Analyzing River Thames Water Levels

Time series data is everywhere, from watching your stock portfolio to monitoring climate change, and even live-tracking as local cases of a virus become a global pandemic. In this project, you’ll work with a time series that tracks the tide levels of the Thames River. You’ll first load the data and inspect it data visually, and then perform calculations on the dataset to generate some summary statistics. You’ll end by decomposing the time series into its component attributes and analyzing them.

The original dataset is available from the British Oceanographic Data Center here and you can read all about this fascinating archival story in this article from the Nature journal.

Here's a map of the locations of the tidal gauges along the River Thames in London.

The dataset comes with a file called Data_description.pdf. The dataset consists of 13 .txt files, containing comma separated data. We'll begin by analyzing one of them, the London Bridge gauge, and preparing it for analysis. The same code can be used to analyze data from other files (i.e. other gauges along the river) later.

Variable Name	Description	Format
Date and time	Date and time of measurement to GMT. Note the tide gauge is accurate to one minute.	dd/mm/yyyy hh:mm:ss
Water level	High or low water level measured by tide gauge. Tide gauges are accurate to 1 centimetre.	metres (Admiralty Chart Datum (CD), Ordnance Datum Newlyn (ODN or Trinity High Water (THW))
Flag	High water flag = 1, low water flag = 0	Categorical (0 or 1)

# We've imported your first Python package for you, along with a function you will need called IQR
import pandas as pd
import datetime as dt

def IQR(column): 
    q25, q75 = column.quantile([0.25, 0.75])
    return q75-q25

# Loading in the data
lb = pd.read_csv('data/10-11_London_Bridge.txt')
# print(lb.columns)

# Subset and rename the first three column
# df = lb[['Date and time', ' water level (m ODN)', ' flag']]
df = lb.iloc[:, :3]
df = df.rename(columns = {'Date and time': 'datetime', ' water level (m ODN)': 'water_level', ' flag': 'is_high_tide'})
# print(df.columns)
# print(df.head())

# Convert datetime column into datetime data type
df['datetime'] = pd.to_datetime(df['datetime'])

# Convert water level column to float
df['water_level'] = df['water_level'].astype('float')
# print(df.water_level.dtype)

# Create the year column
df['year'] = df['datetime'].dt.year
# print(df['year'].value_counts())

# Create the month column
df['month'] = df['datetime'].dt.month
# print(df['month'].value_counts())

# Filter the high tide value from is_high_tide column
tide_high = df.query('is_high_tide == 1')['water_level']
# print(tide_high)

# Filter the low tide value from is_high_tide column
tide_low = df.query('is_high_tide == 0')['water_level']
# print(tide_low)

# Generate descriptive statistics of high tide value
high_statistics = tide_high.agg(['mean', 'median', IQR])
# print(high_statistics)

# Generate descriptive statistics of low tide value
low_statistics = tide_low.agg(['mean', 'median', IQR])
# print(low_statistics)

# Count the number of days of high tide data per year
all_high_days = df.query('is_high_tide == 1').groupby('year').count()['water_level']
# print(all_high_days)

# Count the number of day of water level above 75 percentile
high_days = df.query(f'(water_level > {tide_high.quantile(0.75)}) & (is_high_tide == 1)').groupby('year').count()['water_level']
# print(high_days)

# Calculate the high ratio
high_ratio = (high_days / all_high_days).reset_index()
print(high_ratio)

# Count the number of days of low tide data per year
all_low_days = df.query('is_high_tide == 0').groupby('year').count()['water_level']
print(all_low_days)

# Count the number of da of water level below 25 percentile
low_days = df.query(f'(water_level < {tide_low.quantile(0.25)}) & (is_high_tide == 0)').groupby('year').count()['water_level']
print(low_days)

# Calculate the low ratio
low_ratio = (low_days / all_low_days).reset_index()
print(low_ratio)

# Store the solutions into a dictionary
solution = {'high_statistics': high_statistics, 'low_statistics': low_statistics, 'high_ratio': high_ratio, 'low_ratio':low_ratio}
print(solution)

Project: Analyzing River Thames Water Levels

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Analyzing River Thames Water Levels

Analyzing River Thames Water Levels