Workspace
Subhradeep Rang/

KHOJ - Know Your High Court Judges

0
Beta
Spinner

Exploratory Data Analysis on KHOJ using Python

👉 Introduction

In this notebook, you are going to do data analysis on an interesting dataset named KHOJ - Know Your High Court Judges. This dataset contains information about the judges of all high courts in India from 1993 to 2021.

💾 The Data

The data you will see here contains 1708 rows and 44 columns. But you don't need all columns of this dataset. In the table below, you will see the description of those columns which you will use in this analysis.

ColumnDescription
Name of the JudgeContaining the name of the Judge
GenderGender of the Judge
Date of BirthThe date in which the Judges were born
Date of AppointmentThe date on which the person was elevated as a Judge of any High Court (appointment as an Additional Judge is also considered here)
Date of RetirementThe date on which the person demits office as a Judge of High Court or of the Supreme Court (if elevated to it)
If appointed Chief Justice in any High CourtCategorical column specifying if a judge is appointed as Chief Justice or not.
If appointed to the Supreme CourtCategorical column specifying if a judge is appointed to the Supreme Court or not.
Foreign Degree in LawIf the judge has a Foreign Degree in Law or not.
Post-Graduate in LawIf the judge has a PG Degree in Law or not.
Post-Graduate in another subjectIf the judge has a PG Degree in another subject or not.
Graduation SpecializationThe particular subject is chosen by the Judge during his Graduation.
file TitleName of the Court.

If you want to know about all columns of this data, you should check this link. If you want to analyze the data of a specific court, you can check this website as this data contains all informations about the Judges of all courts.

What to do🤔?

You know about the data. Now you have to ask yourself  - what you want to know from this data? Below are some questions which I want to know from this data.

  • What is the average age of Judges when they are appointed as a Judge of any High Court?
  • What is the average retirement age of the High Court Judge ?
  • What is the average duration of working as a Judge?
  • What is the Ratio of the Male and Female Judges?
  • What is the Education Qualification of Judges? This also has four subparts.
    • How many of them have done Post Graduation in Law?
    • How many of them have a Foreign degree in Law?
    • Which subject they chose in their Graduation Specialization?
    • How many of them have done Post Graduation in another subject other than Law?
  • What is the Judge's designation? It also has two subparts.
    • How many judges per state had been promoted as a chief justice in any High Court?
    • How many judges per state have been promoted as a judge in the Supreme Court?

It's also possible that the question/questions you think is/are not in the list. You can add them. Now you know about the data and you also know that what you want to know. Now you can finally go to the data analysis part.

🧹 Analyzing and Cleaning the Data

Importing the necessary libraries

As we are not doing any data visualization tasks in Python here, we are not going to import any data visualization library. For our task, it is sufficient to import Pandas and Numpy. Let's import those libraries.

import pandas as pd
import numpy as np

Quick Look on data

Now, let's see what our data looks like. As the file is in .csv format, we import this data in Pandas using the read_csv() method.

judge_data = pd.read_csv("khoj-1.8.csv")
judge_data.head()

By first look, we can notice that,

  • There are three date columns available - Date of Birth, Date of Appointment and Date of Retirement. But suprisingly, Pandas detects them as object, which is the style of Pandas library, telling you that these columns are categorical columns.
  • Some columns containing the value Not Available and Not Applicable. Here Not Available denotes the null value in the column and Not Applicable means that column is not applicable for that specific judge.
date_cols = ['Date of Birth', 'Date of Appointment', 'Date of Retirement']
judge_data[date_cols].info()
judge_data.shape
judge_data.info()
Hidden output
judge_data.fileTitle.unique()
Hidden output

Now, you don't need the whole date of these three date columns. As you are only interested in the age of the judge, it is sufficient for us to take only the year from the date. But before doing that, you have to replace the Not Available value with np.nan. Otherwise, while converting those columns to pd.datetime format, it will throw an error.

for cols in date_cols:
    # Replacing "Not Available" value with np.nan
    judge_data[cols] = judge_data[cols].replace("Not Available", np.nan)
    
    # Converting the date columns to pd.datetime format
    judge_data[cols] = pd.to_datetime(judge_data[cols])
    
    # extracting year from date
    judge_data[cols] = pd.DatetimeIndex(judge_data[cols]).year

After this operation, these date column contains only the year. So, it is no longer needed to call those columns as date columns. It's time for replace their names.

# making a list containing the new name of the date columns
date_rename = ['Year of Birth', 'Year of Appointment', 'Year of Retirement']

# zip the date_cols and date_rename cols and making a dictionary with it
rename_dict = dict(zip(date_cols, date_rename))

# finally renaming the columns with that dictionary
judge_data.rename(columns=rename_dict, inplace=True)
judge_data[date_rename].head()
judge_data[date_rename].info()

Let's see some summary statistics of these year columns.

judge_data[date_rename].describe()

You don't get any valuable information from the above summary, isn't it? What happens if we convert those columns to object and again see their summary stats?

# Selecting the date columns
date_columns = judge_data.select_dtypes(exclude='object')

for col in date_columns.columns:
    # converting the column to object
    date_columns[col] = date_columns[col].astype('object')
    
# See the summary statistics
date_columns.describe(include='O')

It seems that most of the Judges were born in 1956, appointed in 2016 and retired in 2018. That's all we got from these three columns.

Now, it's the time for removing the unnecessary columns. For this, we have to see the summary stats of the categorical columns.