Can you estimate the age of an abalone?
You are working as an intern for an abalone farming operation in Japan. For operational and environmental reasons, it is an important consideration to estimate the age of the abalones when they go to market.
Determining an abalone's age involves counting the number of rings in a cross-section of the shell through a microscope. Since this method is somewhat cumbersome and complex, you are interested in helping the farmers estimate the age of the abalone using its physical characteristics.
💾 The data
You have access to the following historical data (source):
- "sex" - M, F, and I (infant).
- "length" - longest shell measurement.
- "diameter" - perpendicular to the length.
- "height" - measured with meat in the shell.
- "whole_wt" - whole abalone weight.
- "shucked_wt" - the weight of abalone meat.
- "viscera_wt" - gut-weight.
- "shell_wt" - the weight of the dried shell.
- "rings" - number of rings in a shell cross-section.
- "age" - the age of the abalone: the number of rings + 1.5.
Acknowledgments: Warwick J Nash, Tracy L Sellers, Simon R Talbot, Andrew J Cawthorn, and Wes B Ford (1994) "The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait", Sea Fisheries Division, Technical Report No. 48 (ISSN 1034-3288).
💪 Competition challenge
Create a report that covers the following:
- How does weight change with age for each of the three sex categories?
- Can you estimate an abalone's age using its physical characteristics?
- Investigate which variables are better predictors of age for abalones.
Estimate the Age of Abalone:
Importing all the needed libraries. And creating a dataframe from abalone dataset.
import pandas as pd import seaborn as sns import plotly.express as px import plotly.graph_objects as go from plotly.subplots import make_subplots import matplotlib.pyplot as plt
Creating dataframe for abalone dataset and seperating input and output data columns
# creating a dataframe of the abalone dataset abalone = pd.read_csv('./data/abalone.csv') # seperating input and output cells characteristics = abalone.drop(columns='age') age = abalone['age'] # printing abalone dataframe abalone
Relation b/w age and other characteristics
As we are finding the relation between age and any other physical characteristic. We are first checking if there is any correclation between age and other physical characteristics.
Finding the correlation using pearson method.
As we can see here that there is a positive corelation between all the characteristics and abalone's age. With the least correlation lying between shucked_wt and age, and complete correlation between rings and age.
Now let's break this into part to see this in detail.
Relationship b/w Age and Weight
Data Transformation: Saving data of each sex in seperate variables. And calculating the average (mean) of all the fields by age for each of the sex category i.e., for infant, female and male.
# seperating out each sex data infants_dataset = abalone[abalone['sex'] == 'I'] female_dataset = abalone[abalone['sex'] == 'F'] male_dataset = abalone[abalone['sex'] == 'M'] # calculating the mean of abalone characteristics by sex and age abalone_age_mean = abalone.groupby(['sex', 'age']).mean() # calculating the mean of abalone characteristics by age for each sex separately infant_age_mean = infants_dataset.groupby('age').mean() female_age_mean = female_dataset.groupby('age').mean() male_age_mean = male_dataset.groupby('age').mean()
Now lets find the correction between age and other characteristics for each abalone sex.
Displaying average of abalone characteristics by sex and age.
- Darker the color, smaller the value.
abalone_age_mean = abalone.groupby(['sex', 'age']).mean() abalone_age_mean.style.background_gradient( sns.color_palette('rocket', as_cmap=True) )
Displaying average of abalone characteristics by age using three different colors, one for each category of abalone sex field.
- red color for female, and
- green color for infants, and
- blue color for male
cm = sns.color_palette('Reds', as_cmap=True) # female characteristics mean by age female_age_mean.style.background_gradient(cmap=cm)
cm = sns.color_palette('Greens', as_cmap=True) # infants characteristics mean by age infant_age_mean.style.background_gradient(cmap=cm)
cm = sns.color_palette('Blues', as_cmap=True) # male characteristics mean by age male_age_mean.style.background_gradient(cmap=cm)