Beta
Exploratory data analysis with PostgreSQL
Date: 18.6.2023
Data: Cinema dataset by Datacamp. It contains 5 tables:
- descriptions (columns: description, length, rating),
- films (columns: id, title, release_year, country, duration, language, certification, gross, budget),
- people (columns: id, name, birthdate, deathdate),
- reviews (columns: id, film_id, num_user, num_critic, imdb_score, num_votes, facebook_likes)
- roles (columns: id, film_id, person_id, role).
Exploratory data analysis steps:
- Import
- Tidy
- Explore & Understand: Transform
- Explore & Understand: Model
- Explore & Understand: Visualise
- Communicate
Step 1. Import
1. Modify tables
Create table (if needed)
CREATE TABLE IF NOT EXISTS cinema.films (
id SERIAL PRIMARY KEY,
title VARCHAR(255) NOT NULL,
release_year INTEGER NOT NULL,
country VARCHAR(255) NOT NULL,
duration INTEGER NOT NULL,
language VARCHAR(255) NOT NULL,
certification VARCHAR(255) NOT NULL,
gross NUMERIC(10,2) NOT NULL,
budget NUMERIC(10,2) NOT NULL
);
Add column
ALTER TABLE cinema.films
ADD COLUMN shooting_location VARCHAR(255);
Change data type of the column
ALTER TABLE cinema.films
ALTER COLUMN budget TYPE TEXT;
2. Exploring tables
- descriptions table
Unknown integration
DataFrameavailable as
descriptions
variable
SELECT *
FROM cinema.descriptions;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
- films table
Unknown integration
DataFrameavailable as
films
variable
SELECT *
FROM cinema.films;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
- people table
Unknown integration
DataFrameavailable as
people
variable
SELECT *
FROM cinema.people;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
- reviews table
Unknown integration
DataFrameavailable as
reviews
variable
SELECT *
FROM cinema.reviews;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
- roles table
Unknown integration
DataFrameavailable as
roles
variable
SELECT *
FROM cinema.roles;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.