EDA with PostgreSQL: Cinema dataset
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Exploratory data analysis with PostgreSQL

    Date: 18.6.2023

    Data: Cinema dataset by Datacamp. It contains 5 tables:

    1. descriptions (columns: description, length, rating),
    2. films (columns: id, title, release_year, country, duration, language, certification, gross, budget),
    3. people (columns: id, name, birthdate, deathdate),
    4. reviews (columns: id, film_id, num_user, num_critic, imdb_score, num_votes, facebook_likes)
    5. roles (columns: id, film_id, person_id, role).

    Exploratory data analysis steps:

    1. Import
    2. Tidy
    3. Explore & Understand: Transform
    4. Explore & Understand: Model
    5. Explore & Understand: Visualise
    6. Communicate

    Step 1. Import

    1. Modify tables

    Create table (if needed)

    CREATE TABLE IF NOT EXISTS cinema.films (

    id SERIAL PRIMARY KEY,

    title VARCHAR(255) NOT NULL,

    release_year INTEGER NOT NULL,

    country VARCHAR(255) NOT NULL,

    duration INTEGER NOT NULL,

    language VARCHAR(255) NOT NULL,

    certification VARCHAR(255) NOT NULL,

    gross NUMERIC(10,2) NOT NULL,

    budget NUMERIC(10,2) NOT NULL

    );

    Add column

    ALTER TABLE cinema.films

    ADD COLUMN shooting_location VARCHAR(255);

    Change data type of the column

    ALTER TABLE cinema.films

    ALTER COLUMN budget TYPE TEXT;

    2. Exploring tables

    1. descriptions table
    Unknown integration
    DataFrameavailable as
    descriptions
    variable
    SELECT *
    FROM cinema.descriptions;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    1. films table
    Unknown integration
    DataFrameavailable as
    films
    variable
    SELECT *
    FROM cinema.films;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    1. people table
    Unknown integration
    DataFrameavailable as
    people
    variable
    SELECT *
    FROM cinema.people;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    1. reviews table
    Unknown integration
    DataFrameavailable as
    reviews
    variable
    SELECT *
    FROM cinema.reviews;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    1. roles table
    Unknown integration
    DataFrameavailable as
    roles
    variable
    SELECT *
    FROM cinema.roles;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.