Is it possible to estimate the age of an abalone?
📖 Background
Japan has a developed seafood market and farming abalones is a significant part of it. For operational and environmental reasons, it is an important consideration to estimate the age of the abalones when they go to market.
Determining an abalone's age involves counting the number of rings in a cross-section of the shell through a microscope. Since this method is somewhat cumbersome and complex, you are interested in helping the farmers estimate the age of the abalone using its physical characteristics.
It is crucial for the analysis design to decide whether to attempt predicting the age of a live abalone or to use all of its given characteristics as a unit prepared for seafood market. In this take we would focus on predictions based on all of the data, but age prediction based only from the measures obtained of a live abalone could be a promising future study.
💾 The data
The dataset was made from the following historical data (source):
Abalone characteristics
Variable | Explanation | |
---|---|---|
0 | sex | M, F, and I (infant) |
1 | length | longest shell measurement |
2 | diameter | perpendicular to the length |
3 | height | measured with meat in the shell |
4 | whole_wt | whole abalone weight |
5 | shucked_wt | the weight of abalone meat |
6 | viscera_wt | gut-weight |
7 | shell_wt | the weight of the dried shell |
8 | rings | number of rings in a shell cross-section |
9 | age | the age of the abalone: the number of rings + 1.5 |
Acknowledgments: Warwick J Nash, Tracy L Sellers, Simon R Talbot, Andrew J Cawthorn, and Wes B Ford (1994) "The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait", Sea Fisheries Division, Technical Report No. 48 (ISSN 1034-3288).
Imports and settings
%%capture
pip install synthia pyvinecopulib tensorflow seaborn
import pandas as pd
import seaborn as sns
import seaborn.objects as so
from seaborn import axes_style
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import synthia as syn
import pyvinecopulib as pv
print(sns.__version__)
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
# Make NumPy and pandas printouts easier to read
%matplotlib inline
rc_params = {**axes_style('whitegrid'),
'legend.markerscale': 3,
'grid.linestyle': ':',
'axes.spines.top': False,
'axes.spines.right': False}
mpl.rcParams.update(rc_params)
cmap = mpl.cm.get_cmap('plasma')
np.set_printoptions(precision=2, suppress=True)
pd.set_option('display.precision', 2)
pd.set_option('display.float_format', lambda x: '%.2f' % x)
from tensorflow.keras.models import Model
from tensorflow.keras.layers import (Input, Dense, Concatenate,
Embedding, Flatten, Normalization)
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score
Read and process the dataset
abalone = pd.read_csv('./data/abalone.csv',
dtype={'sex': 'category'})
abalone.info()
abalone.sample(n=10)
Data exploration
abalone.describe().T
‌
‌