Duplicate of [Live training] Analyzing a Time Series of the Thames River in Python

Beta

Analyzing a Time Series of the Thames River in Python

Task 1: Read one file to explore the data format and prepare the data for analysis.

Instructions

Task 2. Analyze the London Bridge data to get a sense of the water level.

Instructions

Task 3. Assess monthly trends in water level for 1927, 1928 and 1929.

Instructions

Task 4. Beginning a forecasting model for London Bridge: a taste of autocorrelation.

Instructions

Where to next?

Analyzing a Time Series of the Thames River in Python

Time series data is everywhere, from watching your stock portfolio to monitoring climate change, and even live-tracking as local cases of a virus become a global pandemic. In this live code-along, you’ll work with a time series that tracks the tide levels of the Thames River. You’ll first load the data and inspect it data visually, and then perform calculations on the dataset to generate some summary statistics. Next, you’ll decompose the time series into its component attributes. You’ll end with a taster of autocorrelation: a first step in time series forecasting.

The original dataset is available from the British Oceanographic Data Center here and you can read all about this fascinating archival story in this article from the Nature journal.

Here's a map of the locations of the tidal gauges along the River Thames in London.

# Package imports
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Task 1: Read one file to explore the data format and prepare the data for analysis.

The dataset consists of 13 .txt files, containing comma separated data. Navigate to the "Browse and Upload Files" tab in the toolbar on the right to see the data in a folder called Data/. We'll begin by analyzing one of these files and preparing it for analysis. We can then create a helper function in case you are interested in analyzing other data later.

The dataset comes with a file called Data_description.pdf.

Variable Name	Description	Format
Date and time	Date and time of measurement to GMT. Note the tide gauge is accurate to one minute.	dd/mm/yyyy hh:mm:ss
Water level	High or low water level measured by tide gauge. Tide gauges are accurate to 1 centimetre.	metres (Admiralty Chart Datum (CD), Ordnance Datum Newlyn (ODN or Trinity High Water (THW))
Flag	High water flag = 1, low water flag = 0	Categorical (0 or 1)

Let's begin by loading the London Bridge data. When loading time series data, always look out for the time zone the data is provided in. Sometimes, your data might be provided in UTC, and will need to be converted to local time if you want to do local analysis. Fortunately, the description above tells us the data is in GMT, which is the same as coordinated universal time (UTC).

Instructions

Use pandas to read the London Bridge dataset from the CSV file named Data/10-11_London_Bridge.txt and assign it to the variable lb.
Display lb.

lb = pd.read_csv('Data/10-11_London_Bridge.txt')
lb

Since one of the column headings in the csv file had a comma ("flag, HW=1 or LW=0"), pd.read_csv has created an extra, empty column. We'll need to drop this extra column and rename our column headings. Shorter and more memorable column names will facilitate our analysis later on.

Instructions

Call lb.info() or lb.describe() to confirm that the last column is empty and contains no data.
Create a new DataFrame df which takes only the first three columns and rename them as datetime, water_level, and is_high_tide, respectively.

lb.info()

df = lb[lb.columns[0:3]]
df.columns = ['datetime','water_level','is_high_tide']

Calling lb.info() above showed us both the datetime and water_level columns are of type object. We'll convert these to datetime and water_level, respectively. We'll also add two columns, month and year, which we'll need to access later on.

Instructions

Use pd.to_datetime() to convert the datetime column to the datetime format. Since the dataset is large, this step can take a few seconds.
Use .astype(float) to convert the water_level column to the float format.

df['datetime'] = pd.to_datetime(df['datetime'])
df['water_level'] = df['water_level'].astype(float)

df['month'] = df['datetime'].dt.month
df['year'] = df['datetime'].dt.year

‌
‌
‌

Duplicate of [Live training] Analyzing a Time Series of the Thames River in Python

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Analyzing a Time Series of the Thames River in Python

Task 1: Read one file to explore the data format and prepare the data for analysis.

Instructions

Instructions

Instructions

Analyzing a Time Series of the Thames River in Python