Identifying potential drug candidate through Regression Modeling with Chembl Database

Bioinformatics, Interdisplinary (Data Science - Biology) Preliminary Drug Discovery

Prerequisites:

Drug Candidate, would have high potency and specificity to inhibit its molecular target without off-target effect
pIC50, Negavitve logrithm of IC50 in Molar Concentration, Which is Half-maximal inhibitory concentration.

what is the problem?

We have Thousands of drug candidates for our target protien "Human DiHydroOrotate DeHydrogenase" which is a Validated Therapeutic Target for autoimmune diaseases such as rheumatoid arthritis and multiple sclerosis. So, we need a way to predict the potential Drug candidate.

What are the things we can do to find a solution?

End goal of the project, Find a way to Identify Potential Drug Candidate given a target protien.
Identify drug databases, To make informed decision on characteristics/features of a Drug
Do we need a ML Model. If yes, Supervised or Unsupervised. Depends on the Binomial or Ranking of potential Drug Candidate.

Plan Stage

Test for basic Drug candidate characteristics: Biological Activity, Chemical and Metabolic Activity, Low toxic effects
Identified a database which aligned with our needs (), it also had python library. So, we can use python to work for this project
Gather the data and select prelimnary features

! pip install chembl_webresource_client

Hidden output

import pandas as pd
from chembl_webresource_client.new_client import new_client
import os
def clear():
    os.system( 'cls' )

targets = new_client.target

target_query = targets.search('Human DiHydroOrotate DeHydrogenase')


targets = pd.DataFrame.from_dict(target_query)

targets

file_name = 'Human DiHydroOrotate DeHydrogenase'

Selected Row 4 (index - 3) as Target, The protien is of Homo sapiens with score 26 : https://www.ebi.ac.uk/chembl/target_report_card/CHEMBL1966/

selected_target = targets.target_chembl_id[3]
selected_target

Searching through Chembl to find molecules with IC50 values as filter, which is a measure of a drug's efficacy

activity = new_client.activity
res = activity.filter(target_chembl_id=selected_target).filter(standard_type='IC50')

Create a Dataframe of the result.

df = pd.DataFrame.from_dict(res)
df

df.to_csv(file_name + '_01_bioactivity_data_raw.csv', index=False)

Analyze Stage 2

Understand data
Data Exploration and Structing
Exploraing Features
Feature Engineering

Let's make a copy of raw df and work on that.