Explore a DataFrame
Welcome to your workspace! Here, you can write code and text, run analyses, and share your data insights. In this walkthrough, you will learn the basics of Workspace as you load data from a SQL database and explore it with Python!
Keep an eye out for 💪 icons throughout the notebook. These will indicate opportunities for you to customize the code and flex your coding abilities!
🏃 Run a query
In Workspace, you can add and run SQL cells to access relational databases. The query below retrieves data from a sample Redshift database containing information on ticket sales. It stores the results of the query as a pandas DataFrame with a name of your choosing.
You can click the "Browse tables" button to view the available tables and columns, or continue with the existing query.
To run a cell, click inside it and click "Run" or the ► icon. You can also use Shift-Enter to run a selected cell and automatically switch to the next cell.
-- Select the details and location of different events SELECT eventname AS event_name, catname AS category_name, catgroup AS category_group, venuecity AS city, caldate AS date, SUM(qtysold) AS total_sold, SUM(qtysold * pricepaid) AS total_sales FROM event INNER JOIN date USING(dateid) INNER JOIN category USING(catid) INNER JOIN venue USING(venueid) INNER JOIN sales USING(eventid) GROUP BY 1, 2, 3, 4, 5 ORDER BY total_sold DESC LIMIT 1000
🔭 Explore the data
You now have access to a DataFrame that you can explore further using Python. Just as you used a SQL cell above to query the ticket sales database, you can add code cells to write and run Python code.
Use the cell below to import pandas using the alias
pd, as well as to import the interactive plotting library Plotly Express as
As a reminder, you can run a cell by click inside it and pressing the Run button, or by pressing Shift + Return.
# Import libraries import pandas as pd import plotly.express as px # Preview the data event_details
.info() method prints a summary of a DataFrame. For each column, you can find its name, data type, and the number of non-null rows.
.describe() method returns helpful descriptive statistics for your data, excluding null values.
💪 By default,
.describe() will only return descriptive statistics for numeric values. Try adding
include="all" as an argument to the code below to also print out descriptive statistics for the categorical columns!
✏️ Markdown cells such as this one contain text, and can be edited to add your own notes, observations, and conclusions.
To edit the text, simply click inside the cell and click "Edit", or double-click the cell. You can then add text using Markdown and finalize it by clicking the "View" button, or running the cell using Shift + Enter.
💪 It's a good apreciation
🎨 Visualize the data
An essential part of exploratory analysis is the ability to visualize data. We will begin with a boxplot to identify differences in sales between event categories.
Plotly visualizations are interactive. Be sure to hover over each box to gain additional insights from the plot!
💪 You can add a title to Plotly Express plots by using the
title argument (e.g.,
title="My Fantastic Plot"). Try adding a title to this and the following plot to help readers interpret your visualizations!
# Create a boxplot of total sales by event category fig = px.box(event_details, x="category_group", y="total_sales", title= 'My Fantastic Plot') fig.show()
Scatter plots are a great way to visualize the relationship between two (or more) numeric variables.
💪 You can color points in a scatter plot using the
color parameter. Try adding
color="category_name" as an argument to the scatter plot to add an additional level to your plot!
# Create a scatter plot of total sold versus total sales fig = px.scatter(event_details, x="total_sold", y="total_sales", hover_data=["event_name"], color = 'category_name' ) fig.show()
🔬 Go forth and analyze!
Well done! You have successfully used SQL and Python to load data and explore the resulting DataFrame. Feel free to continue to explore the data and expand on this workspace.
When you're finished, be sure to share your analysis. Click the Share button to build a sleek publication that includes the rendered text, code cells, and code output. When you publish your work, it appears automatically on your DataCamp profile so you can build out your portfolio. If you do not want to share your work with the world, you can adjust the sharing settings.
After you have finished preparing your report, consider the following options:
- Try out our ready-to-use datasets. These cover a variety of topics and include flat files such as csvs and additional databases for you to test out your SQL skills!
- Kickstart your next project by using one of our templates. These provide the code and instructions on various data science topics, ranging from machine learning to visualization.
- Want to go at it on your own! Open a blank workspace and get coding!