Beta
Spotify Music Data
This dataset consists of ~600 songs that were in the top songs of the year from 2010 to 2019 (as measured by Billboard). You can explore interesting song data pulled from Spotify such as the beats per minute, amount of spoken words, loudness, and energy of every song.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.read_csv("spotify_top_music.csv", index_col=0)
Data dictionary
Variable | Explanation | |
---|---|---|
0 | title | The title of the song |
1 | artist | The artist of the song |
2 | top genre | The genre of the song |
3 | year | The year the song was in the Billboard |
4 | bpm | Beats per minute: the tempo of the song |
5 | nrgy | The energy of the song: higher values mean more energetic (fast, loud) |
6 | dnce | The danceability of the song: higher values mean it's easier to dance to |
7 | dB | Decibel: the loudness of the song |
8 | live | Liveness: likeliness the song was recorded with a live audience |
9 | val | Valence: higher values mean a more positive sound (happy, cheerful) |
10 | dur | The duration of the song |
11 | acous | The acousticness of the song: likeliness the song is acoustic |
12 | spch | Speechines: higher values mean more spoken words |
13 | pop | Popularity: higher values mean more popular |
Source of dataset.
Exploring data
# Look at data
data.head()
# Check data types
data.dtypes
# Check for null values
data.isnull().sum()
# 10 most popular songs
data.sort_values(by="pop", ascending=False).head(10)[["title","artist" ,"pop"]]
# Seaborn parameters for visualization
sns.set(rc={"figure.figsize":(15, 7)})
sns.set_context("notebook")
sns.set_style("whitegrid")
# Artists of most popular songs
mst_pop_songs_art = data.sort_values(by="pop", ascending=False).head(10)["artist"].tolist()
# Plot
sns.lineplot(x="year", y="pop", data=data[data["artist"].isin(mst_pop_songs_art)], hue="artist", ci=None).set(
title = '"Most Popular Songs Artists" Popularity Over Time')
# Songs with highest popularity by year
grouped = data.groupby(["year"],as_index=False).pop.max()
merged = data.merge(grouped, on=["year", "pop"], how="inner")[["year", "title", "artist", "pop"]]
merged
# Plot
sns.barplot(x="year", y="pop", hue="title", data=merged, palette="Paired_r").set(title = "Popularity Of Most Popular Songs For Year")
# Move legend outside the plot
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
# 5 Most popular artists in dataset
data.groupby("artist", as_index=False).pop.sum().sort_values("pop", ascending=False).head(5)
mst_pop_art = data.groupby("artist", as_index=False).pop.sum().sort_values("pop", ascending=False).head(5)["artist"].tolist()
# Plot 5 most popular artists popularity over time
sns.lineplot(x="year", y="pop", data=data[data["artist"].isin(mst_pop_art)], hue="artist", ci=None).set(title="Most Popular Artists Popularity Over Time")
# Place legend to the lower right of the plot
plt.legend(loc='lower right')
# 5 genres with highest popularity in dataset
most_popular_genres = data.groupby("top genre", as_index=False).pop.sum().sort_values("pop",ascending=False).head(5)
most_popular_genres
# Most popular genres
most_popular_genres_l = most_popular_genres["top genre"].tolist()
# Plot
sns.lineplot(x="year", y="pop", data=data[data["top genre"].isin(most_popular_genres_l)], hue="top genre", ci=None).set(
title = "Top Genres Popularity Over Time")