Multi-space classification for predicting GPCR-ligands

SummaryA classification of molecules depends on the descriptor set which is used to represent the compounds, and each descriptor could be regarded as one perception of a molecule. In this study we show that a combination of several classifiers that are grounded on separate descriptor sets can be superior to a single classifier that was built using all available descriptors. The task of predicting ligands of G-protein coupled receptors (GPCR) served as an example application. The perceptron, multilayer neural networks, and radial basis function (RBF) networks were employed for prediction. We developed classifiers with and without descriptor selection. Prediction accuracy was assessed by the area under the receiver operating characteristic (ROC) curve. In the case with descriptor selection both the selection and the rank order of the descriptors depended on the type and topology of the neural networks. We demonstrate that the overall prediction accuracy of the system can be improved by joining neural network classifiers of different type and topology using a “jury network” that is trained to evaluate the predictions from the individual classifiers. Seventy-one percent correct prediction of GPCR ligands was obtained.

[1]  Konstantin V. Balakin,et al.  Property-Based Design of GPCR-Targeted Library , 2002, J. Chem. Inf. Comput. Sci..

[2]  Ross D. King,et al.  Homology Induction: the use of machine learning to improve sequence similarity searches , 2002, BMC Bioinformatics.

[3]  R. Boggia,et al.  Genetic algorithms as a strategy for feature selection , 1992 .

[4]  R. Leardi Genetic algorithms in chemometrics and chemistry: a review , 2001 .

[5]  William J Welch,et al.  Comparison of methods based on diversity and similarity for molecule selection and the analysis of drug discovery data. , 2004, Methods in molecular biology.

[6]  Gisbert Schneider,et al.  Collection of bioactive reference compounds for focused library design , 2003 .

[7]  Friedhelm Schwenker,et al.  Three learning phases for radial-basis-function networks , 2001, Neural Networks.

[8]  Mohamad T. Musavi,et al.  On the training of radial basis function classifiers , 1992, Neural Networks.

[9]  G. Schneider,et al.  Virtual Screening for Bioactive Molecules , 2000 .

[10]  T.,et al.  Training Feedforward Networks with the Marquardt Algorithm , 2004 .

[11]  Gisbert Schneider,et al.  SVM-Based Feature Selection for Characterization of Focused Compound Collections , 2004, J. Chem. Inf. Model..

[12]  Bahram Hemmateenejad,et al.  Genetic Algorithm Applied to the Selection of Factors in Principal Component-Artificial Neural Networks: Application to QSAR Study of Calcium Channel Antagonist Activity of 1, 4-Dihydropyridines (Nifedipine Analogous) , 2003, J. Chem. Inf. Comput. Sci..

[13]  Jérôme Hert,et al.  Comparison of Fingerprint-Based Methods for Virtual Screening Using Multiple Bioactive Reference Structures , 2004, J. Chem. Inf. Model..

[14]  G Schneider,et al.  Artificial neural networks for computer-based molecular design. , 1998, Progress in biophysics and molecular biology.

[15]  J. Gareth Polhill,et al.  An approach to guaranteeing generalisation in neural networks , 2001, Neural Networks.

[16]  Igor V. Tetko,et al.  Neural Network Studies, 2. Variable Selection , 1996, J. Chem. Inf. Comput. Sci..

[17]  P. Broberg Statistical methods for ranking differentially expressed genes , 2003, Genome Biology.

[18]  Andrzej Cichocki,et al.  Neural networks for optimization and signal processing , 1993 .

[19]  A K Saxena,et al.  Comparison of MLR, PLS and GA-MLR in QSAR analysis* , 2003, SAR and QSAR in environmental research.

[20]  Terrence J. Sejnowski,et al.  The Computational Brain , 1996, Artif. Intell..

[21]  Lutz Weber,et al.  Practical Approaches to Evolutionary Design , 2000 .

[22]  Gisbert Schneider,et al.  Impact of descriptor vector scaling on the classification of drugs and nondrugs with artificial neural networks , 2004, Journal of molecular modeling.

[23]  Gordon M. Crippen,et al.  Prediction of Physicochemical Parameters by Atomic Contributions , 1999, J. Chem. Inf. Comput. Sci..

[24]  Peter C. Jurs,et al.  Quantitative Structure-Property Relationships for the Prediction of Vapor Pressures of Organic Compounds from Molecular Structures , 2000, J. Chem. Inf. Comput. Sci..

[25]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[26]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[27]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[28]  Riccardo Leardi,et al.  Application of genetic algorithm–PLS for feature selection in spectral data sets , 2000 .

[29]  Jens Sadowski,et al.  Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification , 2003, J. Chem. Inf. Comput. Sci..

[30]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[31]  E Biganzoli,et al.  Feed forward neural networks for the analysis of censored survival data: a partial logistic regression approach. , 1998, Statistics in medicine.

[32]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[33]  Sung-Sau So,et al.  A comparative study of ligand-receptor complex binding affinity prediction methods based on glycogen phosphorylase inhibitors , 1999, J. Comput. Aided Mol. Des..

[34]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[35]  P. Greenland,et al.  Selection and interpretation of diagnostic tests and procedures. Principles and applications. , 1981, Annals of internal medicine.

[36]  Andreas Zell,et al.  Prediction of Aqueous Solubility and Partition Coefficient Optimized by a Genetic Algorithm Based Descriptor Selection Method , 2003, J. Chem. Inf. Comput. Sci..

[37]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[38]  Gisbert Schneider,et al.  ChemSpaceShuttle: A tool for data mining in drug discovery by classification, projection, and 3D visualization , 2003 .

[39]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[40]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[41]  Ida G. Sprinkhuizen-Kuyper,et al.  The error surface of the 2-2-1 XOR network: The finite stationary points , 1998, Neural Networks.

[42]  Kenji Fukumizu,et al.  Adaptive natural gradient learning algorithms for various stochastic models , 2000, Neural Networks.

[43]  David Hartsough,et al.  Toward an Optimal Procedure for Variable Selection and QSAR Model Building , 2001, J. Chem. Inf. Comput. Sci..

[44]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[45]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[46]  H. Kubinyi,et al.  A scoring scheme for discriminating between drugs and nondrugs. , 1998, Journal of medicinal chemistry.

[47]  Jens Sadowski Database Profiling by Neural Networks , 2000 .

[48]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[49]  Leonard G. C. Hamey,et al.  XOR has no local minima: A case study in neural network error surface analysis , 1998, Neural Networks.

[50]  Johann Gasteiger,et al.  Neural networks in chemistry and drug design , 1999 .

[51]  I. Muegge Selection criteria for drug‐like compounds , 2003, Medicinal research reviews.

[52]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[53]  Gisbert Schneider,et al.  Support vector machine applications in bioinformatics. , 2003, Applied bioinformatics.

[54]  Gregg D. Wilensky,et al.  Neural Network Studies , 1993 .

[55]  Martin T. Hagan,et al.  Neural network design , 1995 .

[56]  Gisbert Schneider,et al.  Evaluation of Distance Metrics for Ligand‐Based Similarity Searching , 2004, Chembiochem : a European journal of chemical biology.

[57]  A. Balaban Highly discriminating distance-based topological index , 1982 .

[58]  Stephen D. Collins Neurocomputing 2 , 1992, Neurology.

[59]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[60]  Ajay,et al.  Can we learn to distinguish between "drug-like" and "nondrug-like" molecules? , 1998, Journal of medicinal chemistry.

[61]  Gisbert Schneider,et al.  Evolutionary Molecular Design in Virtual Fitness Landscapes , 2000 .

[62]  Jordi Mestres,et al.  Guided docking approaches to structure-based design and screening. , 2004, Current topics in medicinal chemistry.

[63]  James A. Anderson,et al.  Neurocomputing (vol. 2): directions for research , 1990 .

[64]  J. Gasteiger,et al.  ITERATIVE PARTIAL EQUALIZATION OF ORBITAL ELECTRONEGATIVITY – A RAPID ACCESS TO ATOMIC CHARGES , 1980 .

[65]  Shang-Liang Chen,et al.  Orthogonal least squares learning algorithm for radial basis function networks , 1991, IEEE Trans. Neural Networks.

[66]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.