Workspace
Aroom Gonzalez/

Analyzing online ticket sales with Amazon Redshift

0
Beta
Spinner

Analyzing Online Ticket Sales with Amazon Redshift

In this workspace, we will be accessing data stored in Amazon Redshift, a data warehouse product that is part of Amazon Web Services. More specifically, we'll be analyzing sales activity from a fictional ticketing website where users both buy and sell tickets online for sporting events, shows, and concerts (source).

To consult the solution, head over to the file browser and select notebook-solution.ipynb.

Explore events

Unknown integration
DataFrameavailable as
df
variable
-- List all the events
SELECT * FROM public.event
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

This is linking up to several other tables in the warehouse, such as venue, category and date. Let's join things up.

Unknown integration
DataFrameavailable as
df
variable
SELECT *
FROM event
INNER JOIN public.venue USING(venueid)
INNER JOIN public.category USING(catid)
INNER JOIN public.date USING(dateid)
LIMIT 100
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

There's a starttime column coming from the event table and there's also a caldate column, coming from the date table. Let's see what's up with this.

Unknown integration
DataFrameavailable as
df
variable
SELECT
	CASE WHEN DATE(caldate) = DATE(starttime) THEN True ELSE False END AS same_date,
	COUNT(*)
FROM event
INNER JOIN date USING(dateid)
GROUP BY 1
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df
variable
SELECT MAX(DATEDIFF('hour', caldate, starttime))
FROM event
INNER JOIN date USING(dateid)
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Let's see how much events are happening in different cities.

Unknown integration
DataFrameavailable as
events_per_city
variable
SELECT 
	venuecity,
    COUNT(*) AS num_event
FROM event
INNER JOIN venue USING(venueid)
GROUP BY venuecity
ORDER BY num_event DESC
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
import plotly.express as px
px.bar(events_per_city, x = 'venuecity', y = 'num_event')

Explore listings and sales

Unknown integration
DataFrameavailable as
df
variable
-- show 100 listing records
SELECT * FROM public.listing
LIMIT 100
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df
variable
-- show 100 sales records
SELECT * FROM public.sales
LIMIT 100
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Let's see if multiple sales can happen for the same listing.




  • AI Chat
  • Code