Skip to content
Competition - Abalone Seafood Farming - Volume
  • AI Chat
  • Code
  • Report
  • Spinner

    Can you estimate the age of an abalone?

    📖 Background

    You are working as an intern for an abalone farming operation in Japan. For operational and environmental reasons, it is an important consideration to estimate the age of the abalones when they go to market.

    Determining an abalone's age involves counting the number of rings in a cross-section of the shell through a microscope. Since this method is somewhat cumbersome and complex, you are interested in helping the farmers estimate the age of the abalone using its physical characteristics.

    # dependencies
    
    import pandas as pd
    import numpy as np
    
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    from sklearn.preprocessing import LabelEncoder
    from sklearn.model_selection import train_test_split
    import xgboost as XGB
    from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score, explained_variance_score, confusion_matrix
    from sklearn.model_selection import cross_val_score

    💾 The data

    You have access to the following historical data (source):

    Abalone characteristics:
    • "sex" - M, F, and I (infant).
    • "length" - longest shell measurement.
    • "diameter" - perpendicular to the length.
    • "height" - measured with meat in the shell.
    • "whole_wt" - whole abalone weight.
    • "shucked_wt" - the weight of abalone meat.
    • "viscera_wt" - gut-weight.
    • "shell_wt" - the weight of the dried shell.
    • "rings" - number of rings in a shell cross-section.
    • "age" - the age of the abalone: the number of rings + 1.5.

    Acknowledgments: Warwick J Nash, Tracy L Sellers, Simon R Talbot, Andrew J Cawthorn, and Wes B Ford (1994) "The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait", Sea Fisheries Division, Technical Report No. 48 (ISSN 1034-3288).

    abalone = pd.read_csv('./data/abalone.csv')
    display(abalone)
    for weight in abalone.iloc[:, 4:8]:
        plt.figure(figsize=(10,5))
        sns.lineplot(data=abalone, x='age', y=weight, hue='sex', palette='Set1')
        plt.title('{} change with Age for each of the three Sex categories'.format(weight))
        plt.xlabel('Age')
        plt.ylabel(weight)
        plt.show()

    Data Engeneering.

    For reduce the amount of features, we could combine the size metrics into single approximate volumes feature:

    abalone['volume'] = (4/3)*np.pi*((abalone['length']*abalone['height']*abalone['diameter']))
    display(abalone[['length', 'diameter', 'height', 'volume']])

    Encoding with Label Encoder

    le = LabelEncoder()
    abalone['sex_enc'] = le.fit_transform(abalone['sex'])
    display(abalone.head())
    abalone_enc=abalone.drop('sex', axis=1)
    abalone_enc['sex_enc'] = abalone_enc['sex_enc'].replace(2, 3).replace(0, 2)
    display(abalone_enc.head())
    op_data = abalone_enc.drop(["rings", "shucked_wt", "viscera_wt", "shell_wt"], axis=1)

    Observing

    plt.figure(figsize=(10,5))
    sns.scatterplot(data=op_data, x='age', y='volume', hue='sex_enc', style='sex_enc', palette='Set1_r')
    plt.show()
    plt.figure(figsize=(5,2.5))
    sex_count = op_data.sex_enc.value_counts()
    #print(sex_count)
    sex_count.index = ['Male', 'Infant', 'Female']
    sex_count.plot(kind='bar', color=['red', 'gray', 'orange'])
    print(sex_count)
    plt.show()
    for x in op_data:
        plt.figure(figsize=(5,2.5))
        sns.distplot(op_data[x])
        plt.axvline(x = np.percentile(op_data[x], 2.5), color = 'r', linestyle = '--')
        plt.axvline(x = np.percentile(op_data[x], 97.5), color = 'r', linestyle = '--')
        plt.show()