Analyzing Google Sheets Data in Python
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Code-along 2024-02-27 Analyzing Google Sheets Data in Python

    In this code-along, we will be analyzing Google Analytics data. The sample dataset contains obfuscated GA360 data for August 1, 2017 from the Google Merchandise Store, a real ecommerce store selling Google branded merchandise (Source).

    This data is typical for what you'd see for an ecommerce website. It contains session data with traffic source, location and transcation info. Other data has been anonymized. The data is available for in a publicly available Google Sheet. We will be using a combination of SQL and Python to analyze this dataset, getting more insights into what is driving sales on this particular day.

    Task 0: Setup

    • Make sure you're signed into Google.
    • Click this link to copy a Google Sheet with the data to your own Google account.

    Task 1: Import the session data

    In the SQL cell below, click "Connect a database", click "Google Sheets", follow the instructions, and select your copy of the Google Sheets file. Use a SQL statement to pull in all the data from the Google Sheet.

    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT * FROM main.ga_sessions
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    Task 2: Import packages

    We'll use pandas and plotly, so let's pull these in.

    # Import packages
    import pandas as pd
    import plotly.express as px

    Task 3: Clean the data

    Let's inspect the types of the columns, and make adjustments where needed.

    df.dtypes
    df_clean = df.copy()
    df_clean

    Task 4: Explor the data

    Use the predefined plotting function to explore different grouped session counts as either a bar chart or a pie chart

    def plot_sessions_per_group(df, group, viz_type='bar'):
        sessions_per_group = df.groupby(group).size().reset_index(name='sessions').sort_values(by='sessions')
        if viz_type == 'bar':
            return px.bar(sessions_per_group,
                          x=group,
                          y='sessions',
                          title=f'Number of sessions per {group}',
                          text='sessions')
        elif viz_type == 'pie':
            return px.pie(sessions_per_group,
                          names=group,
                          values='sessions',
                          title=f'Distribution of sessions per {group}')
        else:
            raise ValueError("viz_type can only be 'bar' or 'pie'")

    Task 5: Show hourly visits per channelGrouping

    Adjust the code below to show the hourly visits throughout the day per channelGrouping.

    import pandas as pd
    import plotly.express as px
    
    # Assuming df_clean is a DataFrame that has been defined earlier in the code
    df_clean['visitStartTime'] = pd.to_datetime(df_clean['visitStartTime'], unit='s').dt.tz_localize('UTC').dt.tz_convert('America/Los_Angeles')
    df_clean['hour'] = df_clean['visitStartTime'].dt.to_period('H')
    visits_per_month = df_clean.groupby(['hour']).size().reset_index(name='visits')
    visits_per_month['hour'] = visits_per_month['hour'].astype(str)
    px.line(visits_per_month, x='hour', y='visits', title='Visits per hour throughout the day', markers=True)