A modified uncorrelated linear discriminant analysis model coupled with recursive feature elimination for the prediction of bioactivity

To meet the requirements of providing accurate, robust, and interpretable prediction of bioactivity, a modified uncorrelated linear discriminant analysis (M-ULDA) model was developed. In addition, a feature selection method called recursive feature elimination (RFE), originally used for support vector machine (SVM), was introduced and modified to fit the scheme of ULDA. From the evaluation of six pharmaceutical datasets, the M-UDLA coupled with RFE showed better or comparable classification accuracy with respect to other well-studied methods such as SVM and decision trees. The RFE used for ULDA has the advantage of increasing the computational speed and provides useful insights into biochemical mechanisms related to pharmaceutical activity by significantly reducing the number of variables used for the final model.

[1]  Bernard F. Buxton,et al.  Drug Design by Machine Learning: Support Vector Machines for Pharmaceutical Data Analysis , 2001, Comput. Chem..

[2]  Hongmao Sun A naive bayes classifier for prediction of multidrug resistance reversal activity on the basis of atom typing. , 2005, Journal of medicinal chemistry.

[3]  David A. Yuen,et al.  Ensemble of Linear Models for Predicting Drug Properties , 2005, J. Chem. Inf. Model..

[4]  A. Fedorowicz,et al.  A new descriptor selection scheme for SVM in unbalanced class problem: a case study using skin sensitisation dataset , 2007, SAR and QSAR in environmental research.

[5]  Alexander Golbraikh,et al.  Quantitative Structure−Activity Relationship Analysis of Functionalized Amino Acid Anticonvulsant Agents Using k Nearest Neighbor and Simulated Annealing PLS Methods , 2002 .

[6]  Nelmarie Louw,et al.  Variable selection in kernel Fisher discriminant analysis by means of recursive feature elimination , 2006, Comput. Stat. Data Anal..

[7]  Jesús Vicente de Julián-Ortiz,et al.  Topological Approach to Drug Design , 1995, J. Chem. Inf. Comput. Sci..

[8]  Ting Wang,et al.  Boosting: An Ensemble Learning Tool for Compound Classification and QSAR Modeling , 2005, J. Chem. Inf. Model..

[9]  Sean Ekins,et al.  Development of Computational Models for Enzymes, Transporters, Channels, and Receptors Relevant to ADME/Tox , 2004 .

[10]  David A. Cosgrove,et al.  Lead Hopping Using SVM and 3D Pharmacophore Fingerprints , 2005, J. Chem. Inf. Model..

[11]  P. Kinnunen,et al.  Surface activity profiling of drugs applied to the prediction of blood-brain barrier permeability. , 2004, Journal of medicinal chemistry.

[12]  Ferran Sanz,et al.  Anchor-GRIND: filling the gap between standard 3D QSAR and the GRid-INdependent descriptors. , 2005, Journal of medicinal chemistry.

[13]  Tingjun Hou,et al.  ADME evaluation in drug discovery , 2002, Journal of molecular modeling.

[14]  Johann Gasteiger,et al.  Self-organizing maps for identification of new inhibitors of P-glycoprotein. , 2007, Journal of medicinal chemistry.

[15]  Xin Chen,et al.  Effect of Molecular Descriptor Feature Selection in Support Vector Machine Classification of Pharmacokinetic and Toxicological Properties of Chemical Agents , 2004, J. Chem. Inf. Model..

[16]  Gregory A Landrum,et al.  Building predictive ADMET models for early decisions in drug discovery. , 2004, Current opinion in drug discovery & development.

[17]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[18]  Gunnar Rätsch,et al.  Active Learning with Support Vector Machines in the Drug Discovery Process , 2003, J. Chem. Inf. Comput. Sci..

[19]  David Hartsough,et al.  Toward an Optimal Procedure for Variable Selection and QSAR Model Building , 2001, J. Chem. Inf. Comput. Sci..

[20]  Facundo Pérez-Giménez,et al.  Artificial Neural Networks and Linear Discriminant Analysis: A Valuable Combination in the Selection of New Antibacterial Compounds. , 2004 .

[21]  Sean Ekins,et al.  In silico ADME/Tox: the state of the art. , 2002, Journal of molecular graphics & modelling.

[22]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[23]  Jing-Yu Yang,et al.  Face recognition based on the uncorrelated discriminant transformation , 2001, Pattern Recognit..

[24]  Tat-Seng Chua,et al.  Learning object models from semistructured Web documents , 2006, IEEE Transactions on Knowledge and Data Engineering.

[25]  Ekaterina Gordeeva,et al.  Traditional topological indexes vs electronic, geometrical, and combined molecular descriptors in QSAR/QSPR research , 1993, J. Chem. Inf. Comput. Sci..

[26]  P. Jurs,et al.  Classification of multidrug-resistance reversal agents using structure-based descriptors and linear discriminant analysis. , 2000, Journal of medicinal chemistry.

[27]  C Helma,et al.  Validation of counter propagation neural network models for predictive toxicology according to the OECD principles: a case study , 2006, SAR and QSAR in environmental research.

[28]  Juan M. Luco,et al.  Prediction of the Brain-Blood Distribution of a Large Set of Drugs from Structurally Derived Descriptors Using Partial Least-Squares (PLS) Modeling , 1999, J. Chem. Inf. Comput. Sci..

[29]  Sung Jin Cho,et al.  Genetic Algorithm Guided Selection: Variable Selection and Subset Selection , 2002, J. Chem. Inf. Comput. Sci..

[30]  Gregory W. Kauffman,et al.  QSAR and k-Nearest Neighbor Classification Analysis of Selective Cyclooxygenase-2 Inhibitors Using Topologically-Based Numerical Descriptors , 2001, J. Chem. Inf. Comput. Sci..

[31]  S. O'Brien,et al.  Greater than the sum of its parts: combining models for useful ADMET prediction. , 2005, Journal of medicinal chemistry.

[32]  Jian-Hui Jiang,et al.  Modified Ant Colony Optimization Algorithm for Variable Selection in QSAR Modeling: QSAR Studies of Cyclooxygenase Inhibitors , 2005, J. Chem. Inf. Model..

[33]  A. Beresford,et al.  ADME/PK as part of a rational approach to drug discovery. , 2000, Drug discovery today.

[34]  C. Furlanello,et al.  Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products , 2006 .

[35]  G Klopman,et al.  Quantitative structure-activity relationship of multidrug resistance reversal agents. , 1997, Molecular pharmacology.

[36]  Anne Hersey,et al.  On the mechanism of human intestinal absorption. , 2002, European journal of medicinal chemistry.

[37]  D. Butina,et al.  Predicting ADME properties in silico: methods and models. , 2002, Drug discovery today.

[38]  John W. Sammon,et al.  An Optimal Set of Discriminant Vectors , 1975, IEEE Transactions on Computers.

[39]  Ling Yang,et al.  Classification of Substrates and Inhibitors of P-Glycoprotein Using Unsupervised Machine Learning Approach , 2005, J. Chem. Inf. Model..

[40]  Denis M. Bayada,et al.  Molecular Diversity and Representativity in Chemical Databases , 1999, J. Chem. Inf. Comput. Sci..

[41]  Tingjun Hou,et al.  ADME Evaluation in Drug Discovery, 7. Prediction of Oral Absorption by Correlation and Classification , 2007, J. Chem. Inf. Model..

[42]  David J. Livingstone The Characterization of Chemical Structures Using Molecular Properties. A Survey , 2000 .

[43]  Tatsuya Akutsu,et al.  Graph Kernels for Molecular Structure-Activity Relationship Analysis with Support Vector Machines , 2005, J. Chem. Inf. Model..

[44]  Jieping Ye,et al.  An optimization criterion for generalized discriminant analysis on undersampled problems , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Thierry Langer,et al.  Pharmacophore Identification, in Silico Screening, and Virtual Library Design for Inhibitors of the Human Factor Xa , 2005, J. Chem. Inf. Model..

[46]  Jieping Ye,et al.  Feature Reduction via Generalized Uncorrelated Linear Discriminant Analysis , 2006, IEEE Transactions on Knowledge and Data Engineering.

[47]  A. J. Hopfinger,et al.  Predicting Blood–Brain Barrier Partitioning of Organic Molecules Using Membrane–Interaction QSAR Analysis , 2002, Pharmaceutical Research.

[48]  Christophe G. Lambert,et al.  Analysis of a Large Structure/Biological Activity Data Set Using Recursive Partitioning , 1999, J. Chem. Inf. Comput. Sci..

[49]  David G. Stork,et al.  Pattern Classification , 1973 .

[50]  Walter Schmitt,et al.  A physiological model for the estimation of the fraction dose absorbed in humans. , 2004, Journal of medicinal chemistry.

[51]  Zhi-Wei Cao,et al.  Effect of Selection of Molecular Descriptors on the Prediction of Blood-Brain Barrier Penetrating and Nonpenetrating Agents by Statistical Learning Methods , 2005, J. Chem. Inf. Model..

[52]  Marjana Novic,et al.  Variable Selection and Interpretation in Structure-Affinity Correlation Modeling of Estrogen Receptor Binders , 2005, J. Chem. Inf. Model..