Duplicate of Competition - Abalone Seafood Farming
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Can you estimate abalone age?

    Image(filename='Diseño sin título(1).png')

    1.Introduction

    Abalone is a shellfish considered a delicacy in many parts of the world. An excellent source of iron and pantothenic acid, and a nutritious food resource and farming in Australia, America and East Asia. 100 grams of abalone yields more than 20% recommended daily intake of these nutrients. The economic value of abalone is positively correlated with its age. Therefore, to detect the age of abalone accurately is important for both farmers and customers to determine its price. However, the current technology to decide the age is quite costly and inefficient. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a laborious task. Other measurements, which are easier to obtain, are used to predict the age. Further information, such as weather patterns and location (hence food availability) may be required to solve the problem. However, for this problem we shall assume that the abalone's physical measurements are sufficient to provide an accurate age prediction.

    Paper objectives:

    1. How does weight change with age for each of the three sex categories?
    2. Can you estimate an abalone's age using its physical characteristics?
    3. Investigate which variables are better predictors of age for abalones.
    import numpy as np
    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    import matplotlib.lines as lines
    from scipy.stats import iqr
    from skimage import io
    
    from scipy.stats import skew, kurtosis
    pd.set_option("display.max_columns",None) 
    pd.set_option("display.max_rows",None) 
    from sklearn.neighbors import LocalOutlierFactor
    
    
    from warnings import filterwarnings
    filterwarnings('ignore')
    
    sns.set_style('white')
    plt.rcParams['font.family'] = 'monospace'
    
    from scipy.stats import zscore
    from scipy.stats import iqr
    from scipy import stats
    from IPython.display import Image
    
    blues = ['#193f6e','#3b6ba5','#72a5d3','#b1d3e3','#e1ebec']
    reds = ['#e61010','#e65010','#e68d10','#e6df10','#c2e610']
    cmap_blues = sns.color_palette(blues)
    cmap_reds = sns.color_palette(reds)
    sns.set_palette(cmap_blues)
    
    print('These are color palette I will use in it:')
    sns.palplot(cmap_blues)
    sns.palplot(cmap_reds)

    2.Data preparation

    2.1 Features of data

    • The dataset has 4177 entries and 10 columns:
    FeatureData TypeMeasurementDescription
    sexcategoricalM, F, and I (Infant)
    lengthcontinuousmmlongest shell measurement
    diametercontinuousmmperpendicular to the length
    heightcontinuousmmmeasured with meat in the shell
    whole_wtcontinuousgramswhole abalone weight
    shucked_wtcontinuousgramsthe weight of abalone meat
    viscera_wtcontinuousgramsgut-weight
    shell_wtcontinuousgramsthe weight of the dried shell
    ringscontinuousnumber of rings in a shell cross-section
    agecontinuousthe age of the abalone: the number of rings + 1.5

    2.2 General information

    Now we can see all the general information of the dataset. First we will see the first 5 rows of the dataset. We will go through the typology, we will see that there are no duplicate data and that there are no missing values.

    Hidden code
    Hidden code
    Hidden code
    print('💠 Are there missing values?\n')
    bg_color = '#fbfbfb'
    txt_color = '#5c5c5c'
    # check for missing values
    fig, ax = plt.subplots(tight_layout=True, figsize=(12,6))
    
    fig.patch.set_facecolor(bg_color)
    ax.set_facecolor(bg_color)
    
    mv = abalone.isna()
    ax = sns.heatmap(data=mv, cmap=cmap_reds, cbar=False, ax=ax, )
    
    ax.set_ylabel('')
    ax.set_yticks([])
    ax.set_xticklabels(labels=mv.columns, size=12,rotation=45)
    ax.tick_params(length=0)
    
    fig.text(
        s=':Missing Values',
        x=0, y=1.1,
        fontsize=17, fontweight='bold',
        color=txt_color,
        va='top', ha='left'
    )
    
    fig.text(
        s='''
        we can't see any ...
        ''',
        x=0, y=1.075,
        fontsize=11, fontstyle='italic',
        color=txt_color,
        va='top', ha='left'
    )
    
    plt.show()
    Hidden code
    Hidden code

    2.3 Data preprocessing

    2.3.1 Data typology and single visualization

    2.3.1.1 Categorical data

    The only categorical feature is sex. It is divided into three subcategories: male, female and infant. As can be seen, the distributions between the three categories is homogeneous. The noteworthy fact is that the female subcategory has a lower mean than the other two.