Predicting Mouse Liver Microsomal Stability with “Pruned” Machine Learning Models and Public Data

ABSTRACTPurposeMouse efficacy studies are a critical hurdle to advance translational research of potential therapeutic compounds for many diseases. Although mouse liver microsomal (MLM) stability studies are not a perfect surrogate for in vivo studies of metabolic clearance, they are the initial model system used to assess metabolic stability. Consequently, we explored the development of machine learning models that can enhance the probability of identifying compounds possessing MLM stability.MethodsPublished assays on MLM half-life values were identified in PubChem, reformatted, and curated to create a training set with 894 unique small molecules. These data were used to construct machine learning models assessed with internal cross-validation, external tests with a published set of antitubercular compounds, and independent validation with an additional diverse set of 571 compounds (PubChem data on percent metabolism).Results“Pruning” out the moderately unstable / moderately stable compounds from the training set produced models with superior predictive power. Bayesian models displayed the best predictive power for identifying compounds with a half-life ≥1 h.ConclusionsOur results suggest the pruning strategy may be of general benefit to improve test set enrichment and provide machine learning models with enhanced predictive value for the MLM stability of small organic molecules. This study represents the most exhaustive study to date of using machine learning approaches with MLM data from public sources. Graphical Abstractᅟ

[1]  Sean Ekins,et al.  Analysis and hit filtering of a very large library of compounds screened against Mycobacterium tuberculosis. , 2010, Molecular bioSystems.

[2]  Sean Ekins,et al.  Novel diaryl ureas with efficacy in a mouse model of malaria. , 2013, Bioorganic & medicinal chemistry letters.

[3]  Sean Ekins,et al.  Using Open Source Computational Tools for Predicting Human Metabolic Stability and Additional Absorption, Distribution, Metabolism, Excretion, and Toxicity Properties , 2010, Drug Metabolism and Disposition.

[4]  Franco Lombardo,et al.  A hybrid mixture discriminant analysis-random forest computational model for the prediction of volume of distribution of drugs in human. , 2006, Journal of medicinal chemistry.

[5]  O. Sansom,et al.  MYC‐y mice: From tumour initiation to therapeutic targeting of endogenous MYC , 2013, Molecular oncology.

[6]  Søren Balling Engelsen,et al.  Prediction of in vitro metabolic stability of calcitriol analogs by QSAR , 2003, J. Comput. Aided Mol. Des..

[7]  Sean Ekins,et al.  Fusing Dual-Event Data Sets for Mycobacterium tuberculosis Machine Learning Models and Their Evaluation , 2013, J. Chem. Inf. Model..

[8]  D. Lewis,et al.  On the recognition of mammalian microsomal cytochrome P450 substrates and their characteristics: towards the prediction of human p450 substrate specificity and metabolism. , 2000, Biochemical pharmacology.

[9]  Sean Ekins,et al.  Computational Prediction and Validation of an Expert's Evaluation of Chemical Probes , 2014, J. Chem. Inf. Model..

[10]  Sean Ekins,et al.  Novel web-based tools combining chemistry informatics, biology and social networks for drug discovery. , 2009, Drug discovery today.

[11]  Sean Ekins,et al.  Progress in computational toxicology. , 2014, Journal of pharmacological and toxicological methods.

[12]  D. Rogers,et al.  Using Extended-Connectivity Fingerprints with Laplacian-Modified Bayesian Analysis in High-Throughput Screening Follow-Up , 2005, Journal of biomolecular screening.

[13]  M. Molloy,et al.  From mice to men: GEMMs as trial patients for new NSCLC therapies. , 2014, Seminars in cell & developmental biology.

[14]  E. Gifford,et al.  The development and validation of a computational model to predict rat liver microsomal clearance. , 2009, Journal of pharmaceutical sciences.

[15]  Sean Ekins,et al.  A Predictive Ligand-Based Bayesian Model for Human Drug-Induced Liver Injury , 2010, Drug Metabolism and Disposition.

[16]  C. Hansch,et al.  The QSAR paradigm in the design of less toxic molecules. , 1984, Drug metabolism reviews.

[17]  D. Lewis,et al.  Quantitative structure-activity relationships in substrates, inducers, and inhibitors of cytochrome P4501 (CYP1). , 1997, Drug metabolism reviews.

[18]  Sean Ekins,et al.  Structure-activity relationship for FDA approved drugs as inhibitors of the human sodium taurocholate cotransporting polypeptide (NTCP). , 2013, Molecular pharmaceutics.

[19]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[20]  D. Shen,et al.  Characterization of interintestinal and intraintestinal variations in human CYP3A-dependent metabolism. , 1997, The Journal of pharmacology and experimental therapeutics.

[21]  Franco Lombardo,et al.  Prediction of volume of distribution values in humans for neutral and basic drugs using physicochemical measurements and plasma protein binding data. , 2002, Journal of medicinal chemistry.

[22]  Sean Ekins,et al.  Validating New Tuberculosis Computational Models with Public Whole Cell Screening Aerobic Activity Datasets , 2011, Pharmaceutical Research.

[23]  Anthony E. Klon,et al.  Improved Naïve Bayesian Modeling of Numerical Data for Absorption, Distribution, Metabolism and Excretion (ADME) Property Prediction , 2006, J. Chem. Inf. Model..

[24]  A. Bender,et al.  Analysis of Pharmacology Data and the Prediction of Adverse Drug Reactions and Off‐Target Effects from Chemical Structure , 2007, ChemMedChem.

[25]  Michele Connelly,et al.  Repositioning: the fast track to new anti-malarial medicines? , 2014, Malaria Journal.

[26]  Sean Ekins,et al.  A collaborative database and computational models for tuberculosis drug discovery. , 2010, Molecular bioSystems.

[27]  M. Bentires-Alj,et al.  Mouse models of PIK3CA mutations: one mutation initiates heterogeneous mammary tumors , 2013, The FEBS journal.

[28]  Maurice Dickins,et al.  Compound lipophilicity for substrate binding to human P450s in drug metabolism. , 2004, Drug discovery today.

[29]  Sean Ekins,et al.  Enhancing Hit Identification in Mycobacterium tuberculosis Drug Discovery Using Validated Dual-Event Bayesian Models , 2013, PloS one.

[30]  A. Tropsha,et al.  Development and validation of k-nearest-neighbor QSPR models of metabolic stability of drug candidates. , 2003, Journal of medicinal chemistry.

[31]  Li Di,et al.  Development of QSAR models for microsomal stability: identification of good and bad structural features for rat, human and mouse microsomal stability , 2010, J. Comput. Aided Mol. Des..

[32]  K. Korzekwa,et al.  Predicting the rates and regioselectivity of reactions mediated by the P450 superfamily. , 1996, Methods in enzymology.

[33]  Sean Ekins,et al.  Computational models for tuberculosis drug discovery. , 2013, Methods in molecular biology.

[34]  Jing Lu,et al.  Development of in silico models for human liver microsomal stability , 2007, J. Comput. Aided Mol. Des..

[35]  M H Tarbit,et al.  Structural determinants of cytochrome P450 substrate specificity, binding affinity and catalytic rate. , 1998, Chemico-biological interactions.

[36]  Sean Ekins,et al.  Are Bigger Data Sets Better for Machine Learning? Fusing Single-Point and Dual-Event Dose Response Data for Mycobacterium tuberculosis , 2014, J. Chem. Inf. Model..

[37]  M. Vignali,et al.  Of men in mice: the success and promise of humanized mouse models for human malaria parasite infections , 2014, Cellular microbiology.

[38]  Alex M. Clark,et al.  Open Source Bayesian Models. 2. Mining a "Big Dataset" To Create and Validate Models with ChEMBL , 2015, J. Chem. Inf. Model..

[39]  Sandhya Kortagere,et al.  In Silico Models for Drug Discovery , 2013, Methods in Molecular Biology.

[40]  Arthur J. Olson,et al.  A Virtual Screen Discovers Novel, Fragment-Sized Inhibitors of Mycobacterium tuberculosis InhA , 2015, J. Chem. Inf. Model..

[41]  F. Sanz,et al.  Quinolone antibacterial agents: relationship between structure and in vitro inhibition of the human cytochrome P450 isoform CYP1A2. , 1993, Molecular pharmacology.

[42]  C. Hansch Quantitative Relationships Between Lipophilic Character and Drug Metabolism , 1972 .

[43]  R. Tekmal,et al.  Transgenic mouse models of hormonal mammary carcinogenesis: Advantages and limitations , 2012, The Journal of Steroid Biochemistry and Molecular Biology.

[44]  D. Winkler,et al.  Rapid prediction of chemical metabolism by human UDP-glucuronosyltransferase isoforms using quantum chemical descriptors derived with the electronegativity equalization method. , 2004, Journal of medicinal chemistry.

[45]  K. Befort,et al.  15 years of genetic approaches in vivo for addiction research: Opioid receptor and peptide gene knockout in mouse models of drug abuse , 2014, Neuropharmacology.

[46]  Barry A. Bunin,et al.  Bayesian models leveraging bioactivity and cytotoxicity information for drug discovery. , 2013, Chemistry & biology.

[47]  A. Rosato,et al.  In vitro hepatic conversion of the anticancer agent nemorubicin to its active metabolite PNU-159682 in mice, rats and dogs: a comparison with human liver microsomes. , 2008, Biochemical pharmacology.

[48]  Alex M. Clark,et al.  New target prediction and visualization tools incorporating open source molecular fingerprints for TB Mobile 2.0 , 2014, Journal of Cheminformatics.

[49]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[50]  Jeffrey P. Jones,et al.  Predicting intrinsic clearance for drugs and drug candidates metabolized by aldehyde oxidase. , 2013, Molecular pharmaceutics.

[51]  Alex M. Clark,et al.  Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery Datasets , 2015, J. Chem. Inf. Model..

[52]  Sean Ekins,et al.  Combining Computational Methods for Hit to Lead Optimization in Mycobacterium Tuberculosis Drug Discovery , 2013, Pharmaceutical Research.

[53]  W. Denny,et al.  Synthesis and Structure−Activity Relationships of Aza- and Diazabiphenyl Analogues of the Antitubercular Drug (6S)-2-Nitro-6-{[4-(trifluoromethoxy)benzyl]oxy}-6,7-dihydro-5H-imidazo[2,1-b][1,3]oxazine (PA-824) , 2010 .

[54]  V. Dartois,et al.  A medicinal chemists' guide to the unique difficulties of lead optimization for tuberculosis. , 2013, Bioorganic & medicinal chemistry letters.

[55]  M. Wunderlich,et al.  Xenograft models for normal and malignant stem cells. , 2015, Blood.

[56]  D. Lewis,et al.  Structural characteristics of human P450s involved in drug metabolism: QSARs and lipophilicity profiles. , 2000, Toxicology.

[57]  Sean Ekins,et al.  Novel Applications of Kernel–Partial Least Squares to Modeling a Comprehensive Array of Properties for Drug Discovery , 2006 .

[58]  Sean Ekins,et al.  Methods for predicting human drug metabolism. , 2007, Advances in clinical chemistry.

[59]  Xiaoyang Xia,et al.  Classification of kinase inhibitors using a Bayesian model. , 2004, Journal of medicinal chemistry.

[60]  Antony J. Williams,et al.  Looking Back to the Future: Predicting in Vivo Efficacy of Small Molecules versus Mycobacterium tuberculosis , 2014, J. Chem. Inf. Model..

[61]  Sean Ekins,et al.  Computational Approaches That Predict Metabolic Intermediate Complex Formation with CYP3A4 (+b5) , 2007, Drug Metabolism and Disposition.

[62]  Franco Lombardo,et al.  Prediction of human volume of distribution values for neutral and basic drugs. 2. Extended data set and leave-class-out statistics. , 2004, Journal of medicinal chemistry.

[63]  C. Locuson,et al.  THREE-DIMENSIONAL QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIP ANALYSIS OF CYTOCHROMES P450: EFFECT OF INCORPORATING HIGHER-AFFINITY LIGANDS AND POTENTIAL NEW APPLICATIONS , 2005, Drug Metabolism and Disposition.

[64]  Christopher P Austin,et al.  Monitoring Compound Integrity With Cytochrome P450 Assays and qHTS , 2009, Journal of biomolecular screening.

[65]  I. Campbell,et al.  Chronic Neuroinflammation in Alzheimer's Disease: New Perspectives on Animal Models and Promising Candidate Drugs , 2014, BioMed research international.

[66]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[67]  Sean Ekins,et al.  In silico repositioning of approved drugs for rare and neglected diseases. , 2011, Drug discovery today.

[68]  Lars Carlsson,et al.  State-of-the-art Tools for Computational Site of Metabolism Predictions: Comparative Analysis, Mechanistical Insights, and Future Applications , 2007, Drug metabolism reviews.

[69]  D. Huryn,et al.  Optimization of a Higher Throughput Microsomal Stability Screening Assay for Profiling Drug Discovery Candidates , 2003, Journal of biomolecular screening.

[70]  Ruili Huang,et al.  Predictive Models for Cytochrome P450 Isozymes Based on Quantitative High Throughput Screening Data , 2011, J. Chem. Inf. Model..

[71]  S. Ekins In silico approaches to predicting drug metabolism, toxicology and beyond. , 2003, Biochemical Society transactions.

[72]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[73]  T. Dick,et al.  Comprehensive physicochemical, pharmacokinetic and activity profiling of anti-TB agents. , 2015, The Journal of antimicrobial chemotherapy.

[74]  Evan Bolton,et al.  PubChem's BioAssay Database , 2011, Nucleic Acids Res..

[75]  Barry C. Jones,et al.  DRUG-DRUG INTERACTIONS FOR UDP-GLUCURONOSYLTRANSFERASE SUBSTRATES: A PHARMACOKINETIC EXPLANATION FOR TYPICALLY OBSERVED LOW EXPOSURE (AUCI/AUC) RATIOS , 2004, Drug Metabolism and Disposition.

[76]  C. Hansch,et al.  Quantitative structure-activity relationships of cytochrome P-450. , 1993, Drug metabolism reviews.

[77]  Ruili Huang,et al.  Comprehensive Characterization of Cytochrome P450 Isozyme Selectivity across Chemical Libraries , 2009, Nature Biotechnology.

[78]  Ruili Huang,et al.  Prediction of Cytochrome P450 Profiles of Environmental Chemicals with QSAR Models Built from Drug‐Like Molecules , 2012, Molecular informatics.

[79]  A. Hersey,et al.  X-ray Crystal Structure of Human Dopamine Sulfotransferase, SULT1A3 , 1999, The Journal of Biological Chemistry.

[80]  F. Lombardo,et al.  ElogD(oct): a tool for lipophilicity determination in drug discovery. 2. Basic and neutral compounds. , 2001, Journal of medicinal chemistry.

[81]  Philip Prathipati,et al.  Global Bayesian Models for the Prioritization of Antitubercular Agents , 2008, J. Chem. Inf. Model..

[82]  David Rogers,et al.  Cheminformatics analysis and learning in a data pipelining environment , 2006, Molecular Diversity.

[83]  L. Isaacs,et al.  New Small-Molecule Inhibitors Effectively Blocking Picornavirus Replication , 2014, Journal of Virology.

[84]  Joel S. Freundlich,et al.  Minding the gaps in tuberculosis research. , 2014, Drug discovery today.

[85]  John P. Overington,et al.  The ChEMBL database: a taster for medicinal chemists. , 2014, Future medicinal chemistry.

[86]  W. Denny,et al.  Synthesis and structure-activity studies of biphenyl analogues of the tuberculosis drug (6S)-2-nitro-6-{[4-(trifluoromethoxy)benzyl]oxy}-6,7-dihydro-5H-imidazo[2,1-b][1,3]oxazine (PA-824). , 2010, Journal of medicinal chemistry.

[87]  R. Mendes R: The R Project for Statistical Computing , 2016 .

[88]  C. Hansch,et al.  Structure--activity correlations in the metabolism of drugs. , 1968, Archives of biochemistry and biophysics.

[89]  Robert Richbourg,et al.  Modeling the Environment , 2015 .

[90]  Sarah R. Langdon,et al.  Predicting cytotoxicity from heterogeneous data sources with Bayesian learning , 2010, J. Cheminformatics.

[91]  D. Rigal,et al.  Les souris ne sont pas des hommes et pourtant… : Ce que les souris humanisées nous apprennent sur les maladies infectieuses , 2012 .

[92]  Chris Oostenbrink,et al.  Computational prediction of drug binding and rationalisation of selectivity towards cytochromes P450. , 2008, Expert opinion on drug metabolism & toxicology.

[93]  I. Orme,et al.  Comprehensive analysis of methods used for the evaluation of compounds against Mycobacterium tuberculosis. , 2012, Tuberculosis.

[94]  I. Poggesi,et al.  Computational approaches for predicting CYP-related metabolism properties in the screening of new drugs. , 2006, European journal of medicinal chemistry.