Calculate Shifts, Percent Change, and Windows on Time Series
I use this template to perform foundational manipulations on my time-series data. It covers shifting the data backward or forwards in time (i.e., lags and leads), calculating the percent change between periods, and window aggregations. There are many applications for these types of manipulations, including tracking financial assets, sales forecasting, and analyzing marketing data.
To swap in your dataset in this template, the following is required:
- You must have a dataset with a date column that can be parsed by pandas. This is checked in the code, and if you encounter difficulties, you can consult the documentation for further assistance.
- You must have at least one variable that you are interested in analyzing (e.g., price, sales, etc.).
The placeholder dataset in this template is Google stock price data, containing the closing price on each trading day.
# Load packages
import pandas as pd
from pandas.api.types import is_datetime64_any_dtype as is_datetime
Setting Up Your Data
Before you begin, you will want to set up your data correctly. The read_csv() function from pandas offers several arguments that make working with time series easier from the start:
index_col
will allow you to immediately set the date as the index, allowing easier manipulations afterward.parse_dates
instructs pandas to parse the index as a date if possible.
You can then slice the DataFrame to select the dates/times you are most interested in using. You can read more about time series and date functionality within pandas here. The string you use to slice the DataFrame can be a partial match or a full match.
# Replace this with the name of the column that contains your date information
date_col = "date"
# Replace with the file you want to use and load your dataset into a DataFrame
df = pd.read_csv("google.csv", index_col=date_col, parse_dates=True)
# Check that the index is correctly converted to a date
print("The index been parsed as a date: " + str(is_datetime(df.index)))
If the code above returns False, then you will need to use pandas' to_datetime() function to correctly set the index to a date.
Next, you can specify the variable of interest and choose a date range.
# Specify the variable of interest
var_col = "close" # Replace this with the name of the column you want to analyze
# Reduce the DataFrame down to the relevant columns
df_subset = df[[var_col]] # Pass in the column(s) of interest as a list here
# Select the date range you want to explore
df_time = df_subset[
"2020-1-1":"2021-1-1" # Pass in the date ranges you are interested in here
].copy()
# Reset the index
df_time.reset_index(inplace=True)
# Preview the DataFrame
df_time
Shifting Data
The .shift() method allows you to shift data by a given number of periods. A negative number will produce a lag backward in time, and a positive number will produce a lead forward in time.
Below, we use Workspace's Visualize feature (available in the DataFrame output) to plot the resulting DataFrame. To create the plot, we select "Line" as the type, assign "date" and "price" to the x and y axis respectively, and group by the "period". To add a title to the plot, simply click in the space where the title should appear and add your own!
# Use this line to specify the number of periods to shift
shift_periods = 60
# Create a column shifted backward by the number of periods specified above (lag)
df_time["lag_data"] = df_time[var_col].shift(-shift_periods)
# Create a column shifted forward by the number of periods specified above (lead)
df_time["lead_data"] = df_time[var_col].shift(shift_periods)
# Melt the DataFrame in preparation for visualization and drop null values
df_shift = df_time.melt(id_vars="date", var_name="period", value_name="price").dropna()
# Inspect the DataFrame
df_shift
Original Data with 60 Period Lag and Lead
Percent Change
The .pct_change() method allows you to calculate the percentage change between the current row and another previous row. There are two things to note about the code shown below.
- The
periods
parameter specifies which row to use when calculating the percentage change. It defaults to 1, which means it uses the immediately previous row. Here, 30 periods are used. - By default,
.pct_change()
returns a decimal..mul(100)
multiplies the percentage by 100 for easier reading.
Note: We again use the Workspace "Visualize" feature in the output of the DataFrame to easily create a line plot of our data.
# Use this line to specify the rate of change you want to calculate
pct_return_periods = 30
# Create a column with the percentage increase and multiply by 100
df_time["percent_change"] = (
df_time[var_col].pct_change(periods=pct_return_periods).mul(100)
)
# Select relevant columns and drop null values
df_pct = df_time[["date", "percent_change"]].dropna()
# Preview the DataFrame
df_pct