MLB Outfield dominance
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Can a single player dominate the entire outfield?

    An analytical look at Major League Baseball stats and awards by Michael Troy McGrath

    How we got here

    I’m sitting home one night watching baseball with my dad when the announcers on the broadcast are talking about an outfielder winning a “Gold Glove” award in one part of the outfield and possibly on their way to winning the same award for a different outfield position. That’s when the gears in my head started turning and asked,

    “Has a player ever won a Gold Glove Award in all 3 outfield positions?”

    The issue with coming up with an official answer to this is that for most of the MLB Gold Glove Award’s history, the outfielder award was not handed out to each individual outfield spot (Left, Center Right) but to the three best outfielders in each league (American and National).

    This would lead to specific years like in 2007 where you had 6 awards handed out to players who played a majority of that year in center-field, a position that might allow for more chances to pad defensive stats over their teammates in left and right field. This supposed inequivalence in the positions is what lead to the slit of awards for each outfield position in 2011, but doesn’t really help get the answer to our question. I had to hit the books so to say…

    History of the Gold Glove

    The Gold Glove Award is an award given each year to the Major League Baseball players who have been voted by managers and coaches to have given the best individual fielding performance at their respective position. In its first year, only 9 awards were handed out to the top player in each position of the entire MLB, but since then, has split into 18, for the top player in the NL and AL (with exceptions in 1985, 2007, and 2018 for ties in the voting). For the first four seasons, the award was also handed out to the split positions in the outfield but then grouped up for the 1961-2010 seasons.

    Sourcing the Data

    I started off by obtaining the most up-to-date stats and records I could from The Lahamn Baseball Database (http://www.baseball1.com), which includes stats from the 2021 season, all the way back to 1871. This was an incredibly well-organized dataset with plenty of tables of data for me to use.

    import pandas as pd
    
    # Load in List of Awards handed out in MLB history
    
    awards = pd.read_csv('AwardsPlayers.csv')
    
    awards.head(25)
    # Finding Unique Entries in the award columns
    
    awards.awardID.unique()
    #Filtering out Gold Gloves
    
    gold_glove = awards[awards['awardID'] == 'Gold Glove']
    
    gold_glove.head()
    # Seeing how much each position has
    
    gold_glove.notes.value_counts()
    # Locating any anomalies in the 'OF' outfielder awards given out each year
    
    import matplotlib.pyplot as plt
    import numpy as np
    
    OF_gold_glove = gold_glove[gold_glove['notes'] == 'OF']
    
    OF_gg = OF_gold_glove.groupby('yearID')[["notes"]].count()
    
    OF_gg.plot(kind='bar')
    plt.show()
    OF_gold_glove.head(10)
    # Putting  any OF Gold Gloves handed out to separate positions in a separate group for now
    
    LF_gold_glove = gold_glove['notes'] == 'LF'
    CF_gold_glove = gold_glove['notes'] == 'CF'
    RF_gold_glove = gold_glove['notes'] == 'RF'
    
    OF2_gold_glove = gold_glove[(LF_gold_glove) | (CF_gold_glove) | (RF_gold_glove)]
    OF2_gold_glove.head(25)

    Starting to clean the data

    I began my research by pulling up a list of all the individual awards handed out in MLB history and filtered out for the Gold Glove awards. I then took that information and split the data into 2 more filtered results. One set for Gold Gloves by “Outfielders” and one set for Gold Gloves handed out to those with individual (LF, CF, RF) outfield awards, and put the latter set aside and focused on what to do with the “Outfielder” award.

    # Loading in all of the split outfielder statistics
    
    OF_appear = pd.read_csv('FieldingOFsplit.csv')
    OF_appear.head(10)
    # Here I group each position a player had played in each year by the number of outs played. In theory, a player should have received an award that year for their most played position
    
    OF_appear_outs = OF_appear.groupby(['playerID', 'yearID', 'POS'])[['InnOuts']].sum()
    OF_appear_outs.head(50)
    # Reset the dataset so that the player ID is in it's own column
    
    OF_appear_outs_reset = OF_appear_outs.reset_index().sort_values(['yearID', 'playerID']).reset_index().drop(columns='index')
    OF_appear_outs_reset.head(25)
    # Found out what the number of outs was for each player's most played position was in a given year
    
    OF_appear_outs_max = OF_appear.groupby(['playerID', 'yearID'])[['InnOuts']].max().reset_index().sort_values(['yearID', 'playerID']).reset_index().drop(columns='index')
    OF_appear_outs_max.head(50)
    # Merged the Max number with the earlier dataset so that the POS of the most outs played at a position was labled
    
    OF_appear_outs_max2 = OF_appear_outs_reset.merge(OF_appear_outs_max, on=['playerID', 'yearID', 'InnOuts'])
    OF_appear_outs_max2.head(50)
    # Merged the Gold Glove winner of every year to the position they played the most outs at each year
    
    OF_gold_glove2 = OF_gold_glove.merge(OF_appear_outs_max2, on=['playerID', 'yearID']).drop(columns=['notes', 'InnOuts', 'lgID', 'tie'])
    OF_gold_glove2.head(25)