Comparison of Linear and Nonlinear Classification Algorithms for the Prediction of Drug and Chemical Metabolism by Human UDP-Glucuronosyltransferase Isoforms

Partial least squares discriminant analysis (PLSDA), Bayesian regularized artificial neural network (BRANN), and support vector machine (SVM) methodologies were compared by their ability to classify substrates and nonsubstrates of 12 isoforms of human UDP-glucuronosyltransferase (UGT), an enzyme "superfamily" involved in the metabolism of drugs, nondrug xenobiotics, and endogenous compounds. Simple two-dimensional descriptors were used to capture chemical information. For each data set, 70% of the data were used for training, and the remainder were used to assess the generalization performance. In general, the SVM methodology was able to produce models with the best predictive performance, followed by BRANN and then PLSDA. However, a small number of data sets showed either equivalent or better predictability using PLSDA, which may indicate relatively linear relationships in these data sets. All SVM models showed predictive ability (>60% of test set predicted correctly) and five out of the 12 test sets showed excellent prediction (>80% prediction accuracy). These models represent the first use of pattern recognition methods to discriminate between substrates and nonsubstrates of human drug metabolizing enzymes and the first thorough assessment of three classification algorithms using multiple metabolic data sets.

[1]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[2]  Frank R. Burden,et al.  Atomistic topological indices applied to benzodiazepines using various regression methods , 1998 .

[3]  F. Burden A CHEMICALLY INTUITIVE MOLECULAR INDEX BASED ON THE EIGENVALUES OF A MODIFIED ADJACENCY MATRIX , 1997 .

[4]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[5]  J. Miners,et al.  Pharmacophore and quantitative structure activity relationship modelling of UDP-glucuronosyltransferase 1A1 (UGT1A1) substrates. , 2002, Pharmacogenetics.

[6]  Gisbert Schneider,et al.  A fast virtual screening filter for cytochrome P450 3A4 inhibition liability of compound libraries , 2002 .

[7]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[8]  P. Mackenzie,et al.  Structural and functional studies of UDP-glucuronosyltransferases. , 1999, Drug metabolism reviews.

[9]  Bruce L. Bush,et al.  Sample-distance partial least squares: PLS optimized for many variables, with application to CoMFA , 1993, J. Comput. Aided Mol. Des..

[10]  F. Burden,et al.  A quantitative structure--activity relationships model for the acute toxicity of substituted benzenes to Tetrahymena pyriformis using Bayesian-regularized neural networks. , 2000, Chemical research in toxicology.

[11]  György M Keseru,et al.  A neural network based virtual screening of cytochrome P450 3A4 inhibitors. , 2002, Bioorganic & medicinal chemistry letters.

[12]  Ian T. Nabney,et al.  Netlab: Algorithms for Pattern Recognition , 2002 .

[13]  L B Kier,et al.  Molecular connectivity. 4. Relationships to biological activities. , 1975, Journal of medicinal chemistry.

[14]  Frank R. Burden,et al.  Quantitative Structure-Activity Relationship Studies Using Gaussian Processes , 2001, J. Chem. Inf. Comput. Sci..

[15]  Yvonne C. Martin,et al.  The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor Binding , 1997, J. Chem. Inf. Comput. Sci..

[16]  D W Nebert,et al.  The UDP glycosyltransferase gene superfamily: recommended nomenclature update based on evolutionary divergence. , 1997, Pharmacogenetics.

[17]  Thomas Hofmann,et al.  Predicting CNS Permeability of Drug Molecules: Comparison of Neural Network and Support Vector Machine Algorithms , 2002, J. Comput. Biol..

[18]  R. Tukey,et al.  Human UDP-glucuronosyltransferases: metabolism, expression, and disease. , 2000, Annual review of pharmacology and toxicology.

[19]  R. Czerminski,et al.  Use of Support Vector Machine in Pattern Classification: Application to QSAR Studies , 2001 .

[20]  J. Miners,et al.  Drug glucuronidation in humans. , 1991, Pharmacology & therapeutics.

[21]  Ismael Zamora,et al.  Discriminant and quantitative PLS analysis of competitive CYP2C9 inhibitors versus non-inhibitors using alignment independent GRIND descriptors , 2002, J. Comput. Aided Mol. Des..

[22]  Frank R. Burden,et al.  Use of Automatic Relevance Determination in QSAR Studies Using Bayesian Neural Networks , 2000, J. Chem. Inf. Comput. Sci..

[23]  Martyn G. Ford,et al.  Unsupervised Forward Selection: A Method for Eliminating Redundant Variables , 2000, J. Chem. Inf. Comput. Sci..

[24]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[25]  F. Burden,et al.  Robust QSAR models using Bayesian regularized neural networks. , 1999, Journal of medicinal chemistry.

[26]  J. Miners,et al.  Pharmacophore and quantitative structure-activity relationship modeling: complementary approaches for the rationalization and prediction of UDP-glucuronosyltransferase 1A4 substrate selectivity. , 2003, Journal of medicinal chemistry.