Spotify Music Data
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Spotify Music Data

    This dataset consists of ~600 songs that were in the top songs of the year from 2010 to 2019 (as measured by Billboard). You can explore interesting song data pulled from Spotify such as the beats per minute, amount of spoken words, loudness, and energy of every song.

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    data = pd.read_csv("spotify_top_music.csv", index_col=0)

    Data dictionary

    VariableExplanation
    0titleThe title of the song
    1artistThe artist of the song
    2top genreThe genre of the song
    3yearThe year the song was in the Billboard
    4bpmBeats per minute: the tempo of the song
    5nrgyThe energy of the song: higher values mean more energetic (fast, loud)
    6dnceThe danceability of the song: higher values mean it's easier to dance to
    7dBDecibel: the loudness of the song
    8liveLiveness: likeliness the song was recorded with a live audience
    9valValence: higher values mean a more positive sound (happy, cheerful)
    10durThe duration of the song
    11acousThe acousticness of the song: likeliness the song is acoustic
    12spchSpeechines: higher values mean more spoken words
    13popPopularity: higher values mean more popular

    Source of dataset.

    Exploring data

    # Look at data
    data.head()
    # Check data types
    data.dtypes
    # Check for null values
    data.isnull().sum()
    # 10 most popular songs
    data.sort_values(by="pop", ascending=False).head(10)[["title","artist" ,"pop"]]
    # Seaborn parameters for visualization
    sns.set(rc={"figure.figsize":(15, 7)})
    sns.set_context("notebook")
    sns.set_style("whitegrid")
    
    # Artists of most popular songs
    mst_pop_songs_art = data.sort_values(by="pop", ascending=False).head(10)["artist"].tolist()
    
    # Plot
    sns.lineplot(x="year", y="pop", data=data[data["artist"].isin(mst_pop_songs_art)], hue="artist", ci=None).set(
        title = '"Most Popular Songs Artists" Popularity Over Time')
    # Songs with highest popularity by year
    grouped = data.groupby(["year"],as_index=False).pop.max()
    merged = data.merge(grouped, on=["year", "pop"], how="inner")[["year", "title", "artist", "pop"]]
    merged
    # Plot
    sns.barplot(x="year", y="pop", hue="title", data=merged, palette="Paired_r").set(title = "Popularity Of Most Popular Songs For Year")
    
    # Move legend outside the plot
    plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
    # 5 Most popular artists in dataset
    data.groupby("artist", as_index=False).pop.sum().sort_values("pop", ascending=False).head(5)
    mst_pop_art = data.groupby("artist", as_index=False).pop.sum().sort_values("pop", ascending=False).head(5)["artist"].tolist()
    
    # Plot 5 most popular artists popularity over time
    sns.lineplot(x="year", y="pop", data=data[data["artist"].isin(mst_pop_art)], hue="artist", ci=None).set(title="Most Popular Artists Popularity Over Time")
    
    # Place legend to the lower right of the plot
    plt.legend(loc='lower right')
    # 5 genres with highest popularity in dataset
    most_popular_genres = data.groupby("top genre", as_index=False).pop.sum().sort_values("pop",ascending=False).head(5)
    most_popular_genres
    # Most popular genres
    most_popular_genres_l = most_popular_genres["top genre"].tolist()
    
    # Plot
    sns.lineplot(x="year", y="pop", data=data[data["top genre"].isin(most_popular_genres_l)], hue="top genre", ci=None).set(
        title = "Top Genres Popularity Over Time")