Competition - Abalone Seafood Farming
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Can you estimate the age of an abalone?

    📖 Background

    You are working as an intern for an abalone farming operation in Japan. For operational and environmental reasons, it is an important consideration to estimate the age of the abalones when they go to market.

    Determining an abalone's age involves counting the number of rings in a cross-section of the shell through a microscope. Since this method is somewhat cumbersome and complex, you are interested in helping the farmers estimate the age of the abalone using its physical characteristics.

    Data Preparation

    Cleaning

    Lets make sure this data is clean!

    Hidden code

    Data appears to be clean! For this dataset, duplication is not a problem so we won't look at that.

    Exploratory Data Analysis

    Lets take a quick look at all the variables present and how they relate.

    sns.pairplot(abalone, hue = 'sex')

    Given that rings and age are directly correlated (age = rings +1.5), and that age is our target, best to remove age and rings from the "feature data set". It appears that height is one of the most linear looking when plotted against age. It's also abundently clear that the Infant distribution is drastically different from the M/F distributions. What if we sepearated the two???

    abalone.drop('rings', inplace = True, axis = 1)
    

    First goal is to investigate the weight-age correlation for each of the three categories: M, F, I. Weight was not specified, so lets look at them all!

    fig, axes = plt.subplots(2, 2)
    
    sns.scatterplot(data = abalone, x = 'age', y = 'whole_wt', hue = 'sex', ax=axes[0,0])
    sns.scatterplot(data = abalone, x = 'age', y = 'viscera_wt', hue = 'sex', ax=axes[1,0])
    sns.scatterplot(data = abalone, x = 'age', y = 'shucked_wt', hue = 'sex', ax=axes[0,1])
    sns.scatterplot(data = abalone, x = 'age', y = 'shell_wt', hue = 'sex', ax=axes[1,1])
    
    

    From this, we can definitely tell that infants tend to be smaller than males or females, but there doesn't appear to be a trend of weight distribution for males and females. Weight appears to differ widely as well! We should look at the average weight per category per age, and add in standard deviation lines to really get a good idea. Linear Correlation starts breaks down around 8 years, important for our future predictions

    Estimate Age Based on physical characteristics.