Data Curation can Improve the Prediction Accuracy of Metabolic Intrinsic Clearance

A key consideration at the screening stages of drug discovery is in vitro metabolic stability, often measured in human liver microsomes. Computational prediction models can be built using a large quantity of experimental data available from public databases, but these databases typically contain data measured using various protocols in different laboratories, raising the issue of data quality. In this study, we retrieved the intrinsic clearance (CLint) measurements from an open database and performed extensive manual curation. Then, chemical descriptors were calculated using freely available software, and prediction models were built using machine learning algorithms. The models trained on the curated data showed better performance than those trained on the non‐curated data and achieved performance comparable to previously published models, showing the importance of manual curation in data preparation. The curated data were made available, to make our models fully reproducible.

[1]  Robert J Riley,et al.  Harmonised high throughput microsomal stability assay. , 2017, Journal of pharmacological and toxicological methods.

[2]  Tatsuya Takagi,et al.  Mordred: a molecular descriptor calculator , 2018, Journal of Cheminformatics.

[3]  Sean Ekins,et al.  Using Open Source Computational Tools for Predicting Human Metabolic Stability and Additional Absorption, Distribution, Metabolism, Excretion, and Toxicity Properties , 2010, Drug Metabolism and Disposition.

[4]  Li Di,et al.  Development of QSAR models for microsomal stability: identification of good and bad structural features for rat, human and mouse microsomal stability , 2010, J. Comput. Aided Mol. Des..

[5]  Klaus-Robert Müller,et al.  A Probabilistic Approach to Classifying Metabolic Stability , 2008, J. Chem. Inf. Model..

[6]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[7]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings , 1997 .

[8]  Andreas Zell,et al.  jCompoundMapper: An open source Java library and command-line tool for chemical fingerprints , 2011, J. Cheminformatics.

[9]  Jing Lu,et al.  Development of in silico models for human liver microsomal stability , 2007, J. Comput. Aided Mol. Des..

[10]  D J Rance,et al.  The prediction of human pharmacokinetic parameters from preclinical and in vitro metabolism data. , 1997, The Journal of pharmacology and experimental therapeutics.

[11]  Yojiro Sakiyama,et al.  Predicting human liver microsomal stability with machine learning techniques. , 2008, Journal of molecular graphics & modelling.

[12]  Ruifeng Liu,et al.  Critically Assessing the Predictive Power of QSAR Models for Human Liver Microsomal Stability , 2015, J. Chem. Inf. Model..

[13]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  M. Delp,et al.  Physiological Parameter Values for Physiologically Based Pharmacokinetic Models , 1997, Toxicology and industrial health.

[16]  Alexey V Zakharov,et al.  Computational tools and resources for metabolism-related property predictions. 2. Application to prediction of half-life time in human liver microsomes. , 2012, Future medicinal chemistry.

[17]  Kenji Mizuguchi,et al.  Integration of Ligand and Structure Based Approaches for CSAR-2014 , 2016, J. Chem. Inf. Model..

[18]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[19]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[20]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[21]  Witold R. Rudnicki,et al.  Feature Selection with the Boruta Package , 2010 .

[22]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[23]  Keith Bowers,et al.  The discovery of AZD9164, a novel muscarinic M3 antagonist. , 2011, Bioorganic & medicinal chemistry letters.

[24]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.