Predicting human liver microsomal stability with machine learning techniques.

To ensure a continuing pipeline in pharmaceutical research, lead candidates must possess appropriate metabolic stability in the drug discovery process. In vitro ADMET (absorption, distribution, metabolism, elimination, and toxicity) screening provides us with useful information regarding the metabolic stability of compounds. However, before the synthesis stage, an efficient process is required in order to deal with the vast quantity of data from large compound libraries and high-throughput screening. Here we have derived a relationship between the chemical structure and its metabolic stability for a data set of in-house compounds by means of various in silico machine learning such as random forest, support vector machine (SVM), logistic regression, and recursive partitioning. For model building, 1952 proprietary compounds comprising two classes (stable/unstable) were used with 193 descriptors calculated by Molecular Operating Environment. The results using test compounds have demonstrated that all classifiers yielded satisfactory results (accuracy > 0.8, sensitivity > 0.9, specificity > 0.6, and precision > 0.8). Above all, classification by random forest as well as SVM yielded kappa values of approximately 0.7 in an independent validation set, slightly higher than other classification tools. These results suggest that nonlinear/ensemble-based classification methods might prove useful in the area of in silico ADME modeling.

[1]  Qing-You Zhang,et al.  Random Forest Prediction of Mutagenicity from Empirical Physicochemical Descriptors , 2007, J. Chem. Inf. Model..

[2]  Anthony Long,et al.  Computer systems for the prediction of xenobiotic metabolism. , 2002, Advanced drug delivery reviews.

[3]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[4]  Steven L. Dixon,et al.  Use of Robust Classification Techniques for the Prediction of Human Cytochrome P450 2D6 Inhibition , 2003, J. Chem. Inf. Comput. Sci..

[5]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[6]  Steve Horvath,et al.  Tumor classification by tissue microarray profiling: random forest clustering applied to renal cell carcinoma , 2005, Modern Pathology.

[7]  M. Relling,et al.  Pharmacogenomics: translating functional genomics into rational therapeutics. , 1999, Science.

[8]  Thomas Lengauer,et al.  Ensemble Methods for Classification in Cheminformatics , 2004, J. Chem. Inf. Model..

[9]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[10]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings , 1997 .

[11]  S. O'Brien,et al.  Greater than the sum of its parts: combining models for useful ADMET prediction. , 2005, Journal of medicinal chemistry.

[12]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[13]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[14]  Andreas Bender,et al.  Chemoinformatics-Based Classification of Prohibited Substances Employed for Doping in Sport , 2006, J. Chem. Inf. Model..

[15]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[16]  I. Kola,et al.  Can the pharmaceutical industry reduce attrition rates? , 2004, Nature Reviews Drug Discovery.

[17]  H. van de Waterbeemd,et al.  ADMET in silico modelling: towards prediction paradise? , 2003, Nature reviews. Drug discovery.

[18]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[19]  Z R Li,et al.  Application of support vector machines to in silico prediction of cytochrome p450 enzyme substrates and inhibitors. , 2006, Current topics in medicinal chemistry.

[20]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[21]  D. Hawkins,et al.  Analysis of a Large Structure‐Activity Data Set Using Recursive Partitioning , 1997 .

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  Christoph Lehmann,et al.  Application and comparison of classification algorithms for recognition of Alzheimer's disease in electrical brain activity (EEG) , 2007, Journal of Neuroscience Methods.

[24]  Sean Ekins,et al.  KOHONEN MAPS FOR PREDICTION OF BINDING TO HUMAN CYTOCHROME P450 3A4 , 2004, Drug Metabolism and Disposition.

[25]  P. Watkins,et al.  THE HUMAN CYP3A SUBFAMILY: PRACTICAL CONSIDERATIONS* , 2000, Drug metabolism reviews.

[26]  Bernd Beck,et al.  A support vector machine approach to classify human cytochrome P450 3A4 inhibitors , 2005, J. Comput. Aided Mol. Des..

[27]  György M Keseru,et al.  A neural network based virtual screening of cytochrome P450 3A4 inhibitors. , 2002, Bioorganic & medicinal chemistry letters.

[28]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[29]  Franco Lombardo,et al.  A hybrid mixture discriminant analysis-random forest computational model for the prediction of volume of distribution of drugs in human. , 2006, Journal of medicinal chemistry.

[30]  R. Bursi,et al.  (Q) SAR study on the metabolic stability of steroidal androgens. , 2001, Journal of molecular graphics & modelling.

[31]  Sean Ekins,et al.  Generation and validation of rapid computational filters for cyp2d6 and cyp3a4. , 2003, Drug metabolism and disposition: the biological fate of chemicals.

[32]  R. Obach,et al.  Prediction of human clearance of twenty-nine drugs from hepatic microsomal intrinsic clearance data: An examination of in vitro half-life approach and nonspecific binding to microsomes. , 1999, Drug metabolism and disposition: the biological fate of chemicals.

[33]  Gary G. Koch,et al.  Categorical Data Analysis Using The SAS1 System , 1995 .

[34]  Thomas Fox,et al.  Machine learning techniques for in silico modeling of drug metabolism. , 2006, Current topics in medicinal chemistry.

[35]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[36]  Stanislav V Kasl,et al.  Applying recursive partitioning to a prospective study of factors associated with adherence to mammography screening guidelines. , 2005, American journal of epidemiology.

[37]  J B Houston,et al.  In vitro-in vivo scaling of CYP kinetic data not consistent with the classical Michaelis-Menten model. , 2000, Drug metabolism and disposition: the biological fate of chemicals.

[38]  A. Tropsha,et al.  Development and validation of k-nearest-neighbor QSPR models of metabolic stability of drug candidates. , 2003, Journal of medicinal chemistry.

[39]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[40]  D J Rance,et al.  The prediction of human pharmacokinetic parameters from preclinical and in vitro metabolism data. , 1997, The Journal of pharmacology and experimental therapeutics.

[41]  Rieko Arimoto,et al.  Development of CYP3A4 Inhibition Models: Comparisons of Machine-Learning Techniques and Molecular Descriptors , 2005, Journal of biomolecular screening.

[42]  Robert C. Glen,et al.  Random Forest Models To Predict Aqueous Solubility , 2007, J. Chem. Inf. Model..

[43]  Jaina Mistry,et al.  A rapid computational filter for cytochrome P450 1A2 inhibition potential of compound libraries. , 2005, Journal of medicinal chemistry.