Skip to content
Identifying potential drug candidate through Regression Modeling with Chembl Database
  • AI Chat
  • Code
  • Report
  • Spinner

    Bioinformatics, Interdisplinary (Data Science - Biology) Preliminary Drug Discovery

    Prerequisites:

    • Drug Candidate, would have high potency and specificity to inhibit its molecular target without off-target effect
    • pIC50, Negavitve logrithm of IC50 in Molar Concentration, Which is Half-maximal inhibitory concentration.

    what is the problem?

    We have Thousands of drug candidates for our target protien "Human DiHydroOrotate DeHydrogenase" which is a Validated Therapeutic Target for autoimmune diaseases such as rheumatoid arthritis and multiple sclerosis. So, we need a way to predict the potential Drug candidate.

    What are the things we can do to find a solution?

    • End goal of the project, Find a way to Identify Potential Drug Candidate given a target protien.
    • Identify drug databases, To make informed decision on characteristics/features of a Drug
    • Do we need a ML Model. If yes, Supervised or Unsupervised. Depends on the Binomial or Ranking of potential Drug Candidate.

    Plan Stage

    • Test for basic Drug candidate characteristics: Biological Activity, Chemical and Metabolic Activity, Low toxic effects
    • Identified a database which aligned with our needs (), it also had python library. So, we can use python to work for this project
    • Gather the data and select prelimnary features
    ! pip install chembl_webresource_client
    
    Hidden output
    import pandas as pd
    from chembl_webresource_client.new_client import new_client
    import os
    def clear():
        os.system( 'cls' )
    targets = new_client.target
    
    target_query = targets.search('Human DiHydroOrotate DeHydrogenase')
    
    
    targets = pd.DataFrame.from_dict(target_query)
    
    targets
    file_name = 'Human DiHydroOrotate DeHydrogenase'

    Selected Row 4 (index - 3) as Target, The protien is of Homo sapiens with score 26 : https://www.ebi.ac.uk/chembl/target_report_card/CHEMBL1966/

    selected_target = targets.target_chembl_id[3]
    selected_target

    Searching through Chembl to find molecules with IC50 values as filter, which is a measure of a drug's efficacy

    activity = new_client.activity
    res = activity.filter(target_chembl_id=selected_target).filter(standard_type='IC50')

    Create a Dataframe of the result.

    df = pd.DataFrame.from_dict(res)
    df
    df.to_csv(file_name + '_01_bioactivity_data_raw.csv', index=False)

    Analyze Stage 2

    • Understand data
    • Data Exploration and Structing
    • Exploraing Features
    • Feature Engineering

    Let's make a copy of raw df and work on that.