Calculate Shifts, Percent Change, and Windows on Time Series
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Calculate Shifts, Percent Change, and Windows on Time Series

    I use this template to perform foundational manipulations on my time-series data. It covers shifting the data backward or forwards in time (i.e., lags and leads), calculating the percent change between periods, and window aggregations. There are many applications for these types of manipulations, including tracking financial assets, sales forecasting, and analyzing marketing data.

    To swap in your dataset in this template, the following is required:

    • You must have a dataset with a date column that can be parsed by pandas. This is checked in the code, and if you encounter difficulties, you can consult the documentation for further assistance.
    • You must have at least one variable that you are interested in analyzing (e.g., price, sales, etc.).

    The placeholder dataset in this template is Google stock price data, containing the closing price on each trading day.

    # Load packages
    import pandas as pd
    from pandas.api.types import is_datetime64_any_dtype as is_datetime

    Setting Up Your Data

    Before you begin, you will want to set up your data correctly. The read_csv() function from pandas offers several arguments that make working with time series easier from the start:

    • index_col will allow you to immediately set the date as the index, allowing easier manipulations afterward.
    • parse_dates instructs pandas to parse the index as a date if possible.

    You can then slice the DataFrame to select the dates/times you are most interested in using. You can read more about time series and date functionality within pandas here. The string you use to slice the DataFrame can be a partial match or a full match.

    # Replace this with the name of the column that contains your date information
    date_col = "date"
    
    # Replace with the file you want to use and load your dataset into a DataFrame
    df = pd.read_csv("google.csv", index_col=date_col, parse_dates=True)
    
    # Check that the index is correctly converted to a date
    print("The index been parsed as a date: " + str(is_datetime(df.index)))

    If the code above returns False, then you will need to use pandas' to_datetime() function to correctly set the index to a date.

    Next, you can specify the variable of interest and choose a date range.

    # Specify the variable of interest
    var_col = "close"  # Replace this with the name of the column you want to analyze
    
    # Reduce the DataFrame down to the relevant columns
    df_subset = df[[var_col]]  # Pass in the column(s) of interest as a list here
    
    # Select the date range you want to explore
    df_time = df_subset[
        "2020-1-1":"2021-1-1"  # Pass in the date ranges you are interested in here
    ].copy()
    
    # Reset the index
    df_time.reset_index(inplace=True)
    
    # Preview the DataFrame
    df_time

    Shifting Data

    The .shift() method allows you to shift data by a given number of periods. A negative number will produce a lag backward in time, and a positive number will produce a lead forward in time.

    Below, we use Workspace's Visualize feature (available in the DataFrame output) to plot the resulting DataFrame. To create the plot, we select "Line" as the type, assign "date" and "price" to the x and y axis respectively, and group by the "period". To add a title to the plot, simply click in the space where the title should appear and add your own!

    # Use this line to specify the number of periods to shift
    shift_periods = 60
    
    # Create a column shifted backward by the number of periods specified above (lag)
    df_time["lag_data"] = df_time[var_col].shift(-shift_periods)
    
    # Create a column shifted forward by the number of periods specified above (lead)
    df_time["lead_data"] = df_time[var_col].shift(shift_periods)
    
    # Melt the DataFrame in preparation for visualization and drop null values
    df_shift = df_time.melt(id_vars="date", var_name="period", value_name="price").dropna()
    
    # Inspect the DataFrame
    df_shift
    Current Type: Line
    Current X-axis: date
    Current Y-axis: price
    Current Color: period

    Original Data with 60 Period Lag and Lead

    Percent Change

    The .pct_change() method allows you to calculate the percentage change between the current row and another previous row. There are two things to note about the code shown below.

    • The periods parameter specifies which row to use when calculating the percentage change. It defaults to 1, which means it uses the immediately previous row. Here, 30 periods are used.
    • By default, .pct_change() returns a decimal. .mul(100) multiplies the percentage by 100 for easier reading.

    Note: We again use the Workspace "Visualize" feature in the output of the DataFrame to easily create a line plot of our data.

    # Use this line to specify the rate of change you want to calculate
    pct_return_periods = 30
    
    # Create a column with the percentage increase and multiply by 100
    df_time["percent_change"] = (
        df_time[var_col].pct_change(periods=pct_return_periods).mul(100)
    )
    
    # Select relevant columns and drop null values
    df_pct = df_time[["date", "percent_change"]].dropna()
    
    # Preview the DataFrame
    df_pct