Skip to content
Identifying potential drug candidate through Regression Modeling with Chembl Database
Bioinformatics, Interdisplinary (Data Science - Biology) Preliminary Drug Discovery
Prerequisites:
- Drug Candidate, would have high potency and specificity to inhibit its molecular target without off-target effect
- pIC50, Negavitve logrithm of IC50 in Molar Concentration, Which is Half-maximal inhibitory concentration.
what is the problem?
We have Thousands of drug candidates for our target protien "Human DiHydroOrotate DeHydrogenase" which is a Validated Therapeutic Target for autoimmune diaseases such as rheumatoid arthritis and multiple sclerosis. So, we need a way to predict the potential Drug candidate.
What are the things we can do to find a solution?
- End goal of the project, Find a way to Identify Potential Drug Candidate given a target protien.
- Identify drug databases, To make informed decision on characteristics/features of a Drug
- Do we need a ML Model. If yes, Supervised or Unsupervised. Depends on the Binomial or Ranking of potential Drug Candidate.
Plan Stage
- Test for basic Drug candidate characteristics: Biological Activity, Chemical and Metabolic Activity, Low toxic effects
- Identified a database which aligned with our needs (), it also had python library. So, we can use python to work for this project
- Gather the data and select prelimnary features
! pip install chembl_webresource_client
Hidden output
import pandas as pd
from chembl_webresource_client.new_client import new_client
import os
def clear():
os.system( 'cls' )
targets = new_client.target
target_query = targets.search('Human DiHydroOrotate DeHydrogenase')
targets = pd.DataFrame.from_dict(target_query)
targets
file_name = 'Human DiHydroOrotate DeHydrogenase'
Selected Row 4 (index - 3) as Target, The protien is of Homo sapiens with score 26 : https://www.ebi.ac.uk/chembl/target_report_card/CHEMBL1966/
selected_target = targets.target_chembl_id[3]
selected_target
Searching through Chembl to find molecules with IC50 values as filter, which is a measure of a drug's efficacy
activity = new_client.activity
res = activity.filter(target_chembl_id=selected_target).filter(standard_type='IC50')
Create a Dataframe of the result.
df = pd.DataFrame.from_dict(res)
df
df.to_csv(file_name + '_01_bioactivity_data_raw.csv', index=False)
Analyze Stage 2
- Understand data
- Data Exploration and Structing
- Exploraing Features
- Feature Engineering
Let's make a copy of raw df and work on that.