Measurement Problems - IMDB Movie Scoring & Sorting
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    IMDB MOVIE SCORING & SORTING

    Weighted Average Ratings IMDb publishes weighted vote averages rather than raw data averages. The simplest way to explain it is that although we accept and consider all votes received by users,not all votes have the same impact (or ‘weight’) on the final rating.

    When unusual voting activity is detected,an alternate weighting calculation may be applied in order to preserve the reliability of our system. To ensure that our rating mechanism remains effective,we do not disclose the exact method used to generate the rating.

    Importing Modules & Dataset

    import pandas as pd
    import math
    import scipy.stats as st
    from sklearn.preprocessing import MinMaxScaler
    pd.set_option('display.max_columns', None)
    pd.set_option('display.expand_frame_repr', False)
    pd.set_option('display.float_format', lambda x: '%.5f' % x)
    df_ = pd.read_csv("movies_metadata.csv", low_memory=False)
    df = df_.copy()
    df = df[["title", "vote_average", "vote_count"]]
    df.head()
    df.shape

    Sorting by Average of Votes

    df.sort_values("vote_average", ascending=False).head(20)

    The vote counts of these top movies are sorted according to the average votes, are not acceptable because they are too low. In order to make this result better, we can determine a lower limit for vote counts.

    df["vote_count"].describe([0.10, 0.25, 0.50, 0.70, 0.80, 0.90, 0.95, 0.99]).T
    #Sorted by average votes with lower vote counts limit
    df[df["vote_count"] > 400].sort_values("vote_average", ascending=False).head(20)

    This result is not good enough. This time sorting the movies according to both vote average and vote counts may be beneficial.

    #First of all, we need to turn the vote counts into 1 to 10 scale.
    df["vote_count_score"] = MinMaxScaler(feature_range=(1, 10)). \
        fit(df[["vote_count"]]). \
        transform(df[["vote_count"]])