Improved Prediction of CYP-Mediated Metabolism with Chemical Fingerprints

Molecule and atom fingerprints, similar to path-based Daylight fingerprints, can substantially improve the accuracy of P450 site-of-metabolism prediction models. Only two chemical fingerprints have been used in metabolism prediction, so little is known about the importance of fingerprint parameters on site of metabolism predictions. It is possible that different fingerprints might yield more accurate models. Here, we study if tuning fingerprints to specific site of metabolism data sets can lead to improved models. We measure the impact of 484 specific chemical fingerprints on the accuracy of P450 site-of-metabolism prediction models on nine P450 isoform site of metabolism data sets. Using a range of search depths, we study path, circular, and subgraph fingerprints. Two different labelings, also, are considered, both standard SMILES labels and also a labeling that marks ring bonds differently than nonring bonds, enabling ortho, para, and meta positioning of substituents to be more clearly encoded. Optimal fingerprint models chosen by cross-validation performance on the full training data are, on average, 3.8% (Top-2; percent of molecules with a site of metabolism in the top two predictions) and 1.4% (AUC; area under the ROC curve) more accurate than base fingerprint models. These gains represent, respectively, a 25.6% and 16.7% reduction in error. A more rigorous assessment selects fingerprints within each cross-validation fold, sometimes selecting different fingerprints for different folds, but yielding a more reliable estimate of generalization error. In this assessment, averaging the scores from the top few fingerprints yields performances improvements of, on average, 3.0% (Top-2) and 0.7% (AUC). These gains are statistically significant and represent, respectively, a 20.1% and 8.8% reduction in error. Between different isoforms, not many consistencies were observed among the top performing fingerprints, with different fingerprints working best for different isoforms. These results suggest that there are important gains achievable in site of metabolism modeling by including and optimizing atom and molecule fingerprints. The optimal site of metabolism models determined by this approach are available for use at http://swami.wustl.edu/.

[1]  Pierre Baldi,et al.  Graph kernels for chemical informatics , 2005, Neural Networks.

[2]  R. Sheridan,et al.  Empirical regioselectivity models for human cytochromes P450 3A4, 2D6, and 2C9. , 2007, Journal of medicinal chemistry.

[3]  D. Russell,et al.  Clinical importance of the cytochromes P450 , 2002, The Lancet.

[4]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[5]  Pierre Baldi,et al.  Lossless Compression of Chemical Fingerprints Using Integer Entropy Codes Improves Storage and Retrieval , 2007, J. Chem. Inf. Model..

[6]  Pu Liu,et al.  Power Keys: A Novel Class of Topological Descriptors Based on Exhaustive Subgraph Enumeration and their Application in Substructure Searching , 2011, J. Chem. Inf. Model..

[7]  David E. Gloriam,et al.  SMARTCyp: A 2D Method for Prediction of Cytochrome P450-Mediated Drug Metabolism. , 2010, ACS medicinal chemistry letters.

[8]  F. Guengerich,et al.  Cytochrome P450s and other enzymes in drug metabolism and toxicity , 2006, The AAPS Journal.

[9]  James J. P. Stewart,et al.  MOPAC: A semiempirical molecular orbital program , 1990, J. Comput. Aided Mol. Des..

[10]  Pierre Baldi,et al.  One- to Four-Dimensional Kernels for Virtual Screening and the Prediction of Physical, Chemical, and Biological Properties , 2007, J. Chem. Inf. Model..

[11]  Chuang Lu,et al.  RELATIVE CONTRIBUTIONS OF THE FIVE MAJOR HUMAN CYTOCHROMES P450, 1A2, 2C9, 2C19, 2D6, AND 3A4, TO THE HEPATIC METABOLISM OF THE PROTEASOME INHIBITOR BORTEZOMIB , 2005, Drug Metabolism and Disposition.

[12]  Robert C. Glen,et al.  Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers , 2014, Journal of Cheminformatics.

[13]  Sanjay Joshua Swamidass,et al.  XenoSite: Accurately Predicting CYP-Mediated Sites of Metabolism with Neural Networks , 2013, J. Chem. Inf. Model..

[14]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[15]  Andreas Bender,et al.  Computational Prediction of Metabolism: Sites, Products, SAR, P450 Enzyme Dynamics, and Mechanisms , 2012, J. Chem. Inf. Model..

[16]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[17]  Kristin P. Bennett,et al.  RS-Predictor Models Augmented with SMARTCyp Reactivities: Robust Metabolic Regioselectivity Predictions for Nine CYP Isozymes , 2012, J. Chem. Inf. Model..

[18]  Kristin P. Bennett,et al.  RS-Predictor: A New Tool for Predicting Sites of Cytochrome P450-Mediated Metabolism Applied to CYP 3A4 , 2011, J. Chem. Inf. Model..

[19]  J. Tuerk,et al.  Structural characterization of sulfadiazine metabolites using H/D exchange combined with various MS/MS experiments , 2005, Journal of the American Society for Mass Spectrometry.