Beta
Analyze Multiple Time Series
This template provides a playbook to analyze multiple time series simultaneously. You will take an indepth look into your time series data by:
- Loading and visualizing your data
- Inspecting the distribution
- Analyzing subsets of your data
- Decomposing time series into seasonality, trend and noise
- Visualizing correlations with a clustermap
# Load packages
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics import tsaplots
import statsmodels.api as sm
import seaborn as sns
df.head()
Unknown integration
DataFrameavailable as
data
variable
SELECT *
FROM cinema.films
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
data.set_index('release_year')
data[['release_year','duration']].plot()
1. Load and visualize your data
# Upload your data as CSV and load as a data frame
df = pd.read_csv(
"data.csv",
parse_dates=["datestamp"], # Tell pandas which column(s) to parse as dates
index_col="datestamp", # Use a date column as your index
)
df.head()
# Plot settings
%config InlineBackend.figure_format='retina'
plt.rcParams["figure.figsize"] = (18, 10)
plt.style.use('ggplot')
# Plot all time series in the df DataFrame
ax = df.plot(
colormap="Spectral", # Set a colormap to avoid overlapping colors
fontsize=10, # Set fontsize
linewidth=0.8, # Set width of lines
)
# Set labels and legend
ax.set_xlabel("Date", fontsize=12) # X axis text
ax.set_ylabel("Unemployment Rate", fontsize=12) # Set font size
ax.set_title("Unemployment rate of U.S. workers by industry", fontsize=15)
ax.legend(
loc="center left", # Set location of legend within bounding box
bbox_to_anchor=(1.0, 0.5), # Set location of bounding box
)
# Annotate your plots with vertical lines
ax.axvline(
"2001-07-01", # Position of vertical line
color="red", # Color of line
linestyle="--", # Style of line
linewidth=2, # Thickness of line
)
ax.axvline("2008-09-01", color="red", linestyle="--", linewidth=2)
# Show plot
plt.show()
2. Inspect the distribution
df.describe()
# Generate a boxplot
ax = df.boxplot(fontsize=10, vert=False) # Plots boxplot horizonally if false
ax.set_xlabel("Unemployment Percentage")
ax.set_title("Distribution of Unemployment by industry")
plt.show()
3. Analyze subsets of your data
a) Visualize (partial) autocorrelation
Autocorrelation refers to the degree of correlation of a variable between two successive time intervals. It measures how the lagged version of the value of a variable is related to the original version of it in a time series.
# Display the autocorrelation plot of your time series
fig = tsaplots.plot_acf(
df["Agriculture"], lags=24 # Change column to inspect
) # Set lag period
# Show plot
plt.show()
# Display the partial autocorrelation plot of your time series
fig = tsaplots.plot_pacf(
df["Agriculture"], lags=24 # Change column to inspect
) # Set lag period
# Show plot
plt.show()