Site of metabolism prediction for six biotransformations mediated by cytochromes P450

MOTIVATION One goal of metabolomics is to define and monitor the entire metabolite complement of a cell, while it is still far from reach since systematic and rapid approaches for determining the biotransformations of newly discovered metabolites are lacking. For drug development, such metabolic biotransformation of a new chemical entity (NCE) is of more interest because it may profoundly affect its bioavailability, activity and toxicity profile. The use of in silico methods to predict the site of metabolism (SOM) in phase I cytochromes P450-mediated reactions is usually a starting point of metabolic pathway studies, which may also assist in the process of drug/lead optimization. RESULTS This article reports the Cytochromes P450 (CYP450)-mediated SOM prediction for the six most important metabolic reactions by incorporating the use of machine learning and semi-empirical quantum chemical calculations. Non-local models were developed on the basis of a large dataset comprising 1858 metabolic reactions extracted from 1034 heterogeneous chemicals. For validation, the overall accuracies of all six reaction types are higher than 0.81, four of which exceed 0.90. In further receiver operating characteristic (ROC) analyses, each of the SOM model gave a significant area under curve (AUC) value over 0.86, indicating a good predicting power. An external test was made on a previously published dataset, of which 80% of the experimentally observed SOMs can be correctly identified by applying the full set of our SOM models. AVAILABILITY The program package SOME_v1.0 (Site Of Metabolism Estimator) developed based on our models is available at http://www.dddc.ac.cn/adme/myzheng/SOME_1_0.tar.gz.

[1]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[2]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[3]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[4]  Imre G. Csizmadia,et al.  Theory and Practice of MO Calculations on Organic Molecules , 1976 .

[5]  Kenichi Fukui,et al.  MO-Theoretical Approach to the Mechanism of Charge Transfer in the Process of Aromatic Substitutions , 1957 .

[6]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[7]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[8]  G. Cruciani,et al.  MetaSite: understanding metabolism in human cytochromes from the perspective of the chemist. , 2005, Journal of medicinal chemistry.

[9]  Ying Liu,et al.  A Comparative Study on Feature Selection Methods for Drug Discovery , 2004, J. Chem. Inf. Model..

[10]  A. B. Sannigrahi AB Initio Molecular Orbital Calculations of Bond Index and Valency , 1992 .

[11]  Johann Gasteiger,et al.  Modeling chemical reactions for drug design , 2007, J. Comput. Aided Mol. Des..

[12]  N Bodor Retrometabolic approaches for drug design and targeting. , 1997, Die Pharmazie.

[13]  D. Winkler,et al.  Rapid prediction of chemical metabolism by human UDP-glucuronosyltransferase isoforms using quantum chemical descriptors derived with the electronegativity equalization method. , 2004, Journal of medicinal chemistry.

[14]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[15]  Lars Carlsson,et al.  State-of-the-art Tools for Computational Site of Metabolism Predictions: Comparative Analysis, Mechanistical Insights, and Future Applications , 2007, Drug metabolism reviews.

[16]  Oldřich Štrouf,et al.  Chemical Pattern Recognition , 1986 .

[17]  Bernd T. Matthias Superconductivity, p‐state pairing, and magnetism , 2009 .

[18]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[19]  David F. V. Lewis,et al.  Structure–activity relationship for human cytochrome P450 substrates and inhibitors , 2002, Drug metabolism reviews.

[20]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[21]  R. Sheridan,et al.  Empirical regioselectivity models for human cytochromes P450 3A4, 2D6, and 2C9. , 2007, Journal of medicinal chemistry.

[22]  Xiangji Huang,et al.  A Case Study for Learning from Imbalanced Data Sets , 2001, Canadian Conference on AI.

[23]  R. Sheridan,et al.  A model for predicting likely sites of CYP3A4-mediated metabolism on drug-like molecules. , 2003, Journal of medicinal chemistry.

[24]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[25]  Luc De Raedt,et al.  Machine Learning: ECML-94 , 1994, Lecture Notes in Computer Science.

[26]  Chris Oostenbrink,et al.  Catalytic site prediction and virtual screening of cytochrome P450 2D6 substrates by consideration of water and rescoring in automated docking. , 2006, Journal of medicinal chemistry.

[27]  Stan Matwin,et al.  Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence , 2001 .

[28]  R. Mortishire-Smith,et al.  Metabolite identification in drug discovery. , 2003, Current opinion in drug discovery & development.

[29]  Federico Girosi,et al.  Support Vector Machines: Training and Applications , 1997 .

[30]  S Udenfriend,et al.  Hydroxylation-induced migration: the NIH shift. Recent experiments reveal an unexpected and general result of enzymatic hydroxylation of aromatic compounds. , 1967, Science.

[31]  Heinz Sklenar,et al.  Molecular structure–biological activity relationships on the basis of quantum‐chemical calculations , 1979 .

[32]  J. Vervoort,et al.  Molecular orbital-based quantitative structure-activity relationship for the cytochrome P450-catalyzed 4-hydroxylation of halogenated anilines. , 1994, Chemical research in toxicology.

[33]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[34]  福井 謙一 Orientation and stereoselection , 1970 .

[35]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[36]  Tatiana Nikolskaya,et al.  Modeling of human cytochrome p450-mediated drug metabolism using unsupervised machine learning approach. , 2003, Journal of medicinal chemistry.

[37]  Chris de Graaf,et al.  Cytochrome p450 in silico: an integrative modeling approach. , 2005, Journal of medicinal chemistry.

[38]  J E Roulston,et al.  Screening with tumor markers , 2002, Molecular biotechnology.

[39]  R. S. Mulliken Electronic Population Analysis on LCAO–MO Molecular Wave Functions. I , 1955 .

[40]  J. Gasteiger,et al.  Automatic generation of 3D-atomic coordinates for organic molecules , 1990 .

[41]  Ilme Schlichting,et al.  Structure and chemistry of cytochrome P450. , 2005, Chemical reviews.

[42]  Susumu Shimoda,et al.  QSAR of Fungicidal Δ3‐1,2,4‐Thiadiazolines. Reactivity‐Activity Correlation of SH‐Inhibitors , 1993 .

[43]  A. Fura,et al.  Role of pharmacologically active metabolites in drug discovery and development. , 2006, Drug discovery today.

[44]  Kenichi Fukui,et al.  Theory of Orientation and Stereoselection , 1975 .

[45]  Anthony Long,et al.  Computer systems for the prediction of xenobiotic metabolism. , 2002, Advanced drug delivery reviews.

[46]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .