Explore a Data Frame
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Exploring a Data Frame

    Welcome to your workspace! Here, you can write code and text, run analyses, and share your data insights. In this walkthrough, you will learn the basics of Workspace as you load data from a SQL database and explore it with R!

    Keep an eye out for 💪   icons throughout the notebook. These will indicate opportunities for you to customize the code and flex your coding abilities!

    🏃  Run a query

    In Workspace, you can add and run SQL cells to access relational databases. The query below retrieves data from a sample Redshift database containing information on ticket sales. It stores the results of the query as a pandas DataFrame with a name of your choosing.

    You can click the "Browse tables" button to view the available tables and columns, or continue with the existing query.

    To run a cell, click inside it and click "Run" or the ► icon. You can also use Shift-Enter to run a selected cell and automatically switch to the next cell.

    Unknown integration
    Data frameavailable as
    event_details
    variable
    -- Select the details and location of different events
    SELECT 
    	eventname AS event_name,
        catname AS category_name,
        catgroup AS category_group,
        venuecity AS city,
        caldate AS date,
        SUM(qtysold) AS total_sold,
        SUM(qtysold * pricepaid) AS total_sales
    FROM event
    INNER JOIN date USING(dateid)
    INNER JOIN category USING(catid)
    INNER JOIN venue USING(venueid)
    INNER JOIN sales USING(eventid)
    GROUP BY 1, 2, 3, 4, 5
    ORDER BY total_sold DESC
    LIMIT 1000
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    🔭  Explore the data

    You now have access to a data frame that you can explore further using R. Just as you used a SQL cell above to query the ticket sales database, you can add code cells to write and run R code.

    Use the cell below to import the plotly and tidyverse libraries.

    As a reminder, you can run a cell by click inside it and pressing the Run button, or by pressing Shift + Return.

    # Install plotly
    install.packages("plotly", quiet = TRUE)
    
    # Import libraries
    suppressPackageStartupMessages(library(tidyverse))
    suppressPackageStartupMessages(library(plotly))
    
    # Preview the data
    event_details

    The glimpse() function prints the structure of a data frame. For each column, you can find its name, data type, a preview of the first few values in the column.

    glimpse(event_details)

    The summary() function returns a summary of a data frame, including the class and mode for categorical columns, and quantitative statistics for numerical columns.

    summary(event_details)

    ✏️  Markdown cells such as this one contain text, and can be edited to add your own notes, observations, and conclusions.

    To edit the text, simply click inside the cell and click "Edit", or double-click the cell. You can then add text using Markdown and finalize it by clicking the "View" button, or running the cell using Shift + Enter.

    💪  Try replacing the text in this cell with your observations on the output of the previous cell. You can also continue to use the "Add Markdown" button throughout this workspace to add more cells to share the insights from your analysis!

    🎨  Visualize the data

    An essential part of exploratory analysis is the ability to visualize data. We will begin with a boxplot to identify differences in sales between event categories.

    Plotly visualizations are interactive. Be sure to hover over each box to gain additional insights from the plot!

    💪  You can add a title to Plotly Express plots by passing your plot to the the layout() function (e.g., %>% layout(title = "My Fantastic Plot")). Try adding a title to this and the following plo to help readers interpret your visualizations!

    # Create a boxplot of total sales by event category
    fig <- plot_ly(event_details, x = ~category_group, y = ~total_sales, type = "box")
    fig

    Scatter plots are a great way to visualize the relationship between two (or more) numeric variables.

    💪  You can color points in a scatter plot using the color parameter. Try adding color = ~category_name as an argument to the scatter plot to add an additional level to your plot!

    # Create a scatter plot of total sold versus total sales
    fig <- plot_ly(event_details, 
                   x = ~total_sold, 
                   y = ~total_sales,
                   type = "scatter", 
                   mode = "markers")
    fig

    🔬  Go forth and analyze!

    Well done! You have successfully used SQL and R to load data and explore the resulting data frame. Feel free to continue to explore the data and expand on this workspace.

    When you're finished, be sure to share your analysis. Click the Share button to build a sleek publication that includes the rendered text, code cells, and code output. When you publish your work, it appears automatically on your DataCamp profile so you can build out your portfolio. If you do not want to share your work with the world, you can adjust the sharing settings.

    After you have finished preparing your report, consider the following options:

    • Try out our ready-to-use datasets. These cover a variety of topics and include flat files such as csvs and additional databases for you to test out your SQL skills!
    • Kickstart your next project by using one of our templates. These provide the code and instructions on various data science topics, ranging from machine learning to visualization.
    • Want to go at it on your own! Open a blank workspace and get coding!