A self‐adaptive genetic algorithm‐artificial neural network algorithm with leave‐one‐out cross validation for descriptor selection in QSAR study

Based on the quantitative structure‐activity relationships (QSARs) models developed by artificial neural networks (ANNs), genetic algorithm (GA) was used in the variable‐selection approach with molecule descriptors and helped to improve the back‐propagation training algorithm as well. The cross validation techniques of leave‐one‐out investigated the validity of the generated ANN model and preferable variable combinations derived in the GAs. A self‐adaptive GA‐ANN model was successfully established by using a new estimate function for avoiding over‐fitting phenomenon in ANN training. Compared with the variables selected in two recent QSAR studies that were based on stepwise multiple linear regression (MLR) models, the variables selected in self‐adaptive GA‐ANN model are superior in constructing ANN model, as they revealed a higher cross validation (CV) coefficient (Q2) and a lower root mean square deviation both in the established model and biological activity prediction. The introduced methods for validation, including leave‐multiple‐out, Y‐randomization, and external validation, proved the superiority of the established GA‐ANN models over MLR models in both stability and predictive power. Self‐adaptive GA‐ANN showed us a prospect of improving QSAR model. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010

[1]  R Kaliszan,et al.  Molecular mechanism of retention in reversed-phase high-performance liquid chromatography and classification of modern stationary phases by using quantitative structure-retention relationships. , 1999, Journal of chromatography. A.

[2]  Paola Gramatica,et al.  QSAR study of malonyl‐CoA decarboxylase inhibitors using GA‐MLR and a new strategy of consensus modeling , 2008, J. Comput. Chem..

[3]  Frédéric Clerc,et al.  OptiCat : A versatile open-source optimization platform for experimental design , 2008 .

[4]  Han van de Waterbeemd,et al.  Chemometric methods in molecular design , 1995 .

[5]  Jens Sadowski,et al.  Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification , 2003, J. Chem. Inf. Comput. Sci..

[6]  Kunal Roy,et al.  QSAR modeling of globulin binding affinity of corticosteroids using AM1 calculations. , 2004, Bioorganic & medicinal chemistry.

[7]  Li Ji,et al.  Back-propagation network improved by conjugate gradient based on genetic algorithm in QSAR study on endocrine disrupting chemicals , 2008 .

[8]  Toshio Fujita,et al.  The Correlation of Biological Activity of Plant Growth Regulators and Chloromycetin Derivatives with Hammett Constants and Partition Coefficients , 1963 .

[9]  Yongjun Wang,et al.  Considerations and recent advances in QSAR models for cytochrome P450-mediated drug metabolism prediction , 2008, J. Comput. Aided Mol. Des..

[10]  Haralambos Sarimveis,et al.  Development and Evaluation of a QSPR Model for the Prediction of Diamagnetic Susceptibility , 2008 .

[11]  Hassan Golmohammadi,et al.  Prediction of air-to-blood partition coefficients of volatile organic compounds using genetic algorithm and artificial neural network. , 2008, Analytica chimica acta.

[12]  Sung Jin Cho,et al.  Genetic Algorithm Guided Selection: Variable Selection and Subset Selection , 2002, J. Chem. Inf. Comput. Sci..

[13]  R. Teófilo,et al.  Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression , 2009 .

[14]  Romualdo Benigni,et al.  Predictivity of QSAR , 2008, J. Chem. Inf. Model..

[15]  Paola Gramatica,et al.  Principles of QSAR models validation: internal and external , 2007 .

[16]  Johann Gasteiger,et al.  Neural networks in chemistry and drug design , 1999 .

[17]  F Despagne,et al.  Neural networks in multivariate calibration. , 1998, The Analyst.

[18]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[19]  S. Wold,et al.  Statistical Validation of QSAR Results , 1995 .

[20]  H. John Smith,et al.  Textbook of Drug Design and Discovery , 2002 .

[21]  D. E. Goldberg,et al.  Genetic Algorithms in Search, Optimization & Machine Learning , 1989 .

[22]  Xiangqin Cui,et al.  Sensors and Actuators B , 2003 .

[23]  A. A. D’Archivio,et al.  Investigation of retention behaviour of non-steroidal anti-inflammatory drugs in high-performance liquid chromatography by using quantitative structure-retention relationships. , 2007, Analytica chimica acta.

[24]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[25]  Dimitra Hadjipavlou‐Litina,et al.  Review, reevaluation, and new results in quantitative structure‐activity studies of anticonvulsants , 1998, Medicinal research reviews.

[26]  A Cornish-Bowden,et al.  Evaluation of rate constants for enzyme-catalysed reactions by the jackknife technique. Application to liver alcohol dehydrogenase. , 1978, The Biochemical journal.

[27]  A. Mehdipour,et al.  DFT‐Based QSAR Study of Valproic Acid and its Derivatives , 2008 .

[28]  Mohammad Reza Ganjali,et al.  Application of GA-MLR, GA-PLS and the DFT quantum mechanical (QM) calculations for the prediction of the selectivity coefficients of a histamine-selective electrode , 2008 .

[29]  K. Schaper,et al.  FREE-WILSON-TYPE ANALYSIS OF NON-ADDITIVE SUBSTITUENT EFFECTS ON THPB DOPAMINE RECEPTOR AFFINITY USING ARTIFICIAL NEURAL NETWORKS , 1999 .

[30]  U Depczynski,et al.  Genetic algorithms applied to the selection of factors in principal component regression , 2000 .

[31]  C. Hansch,et al.  Confidence interval estimators for parameters associated with quantitative structure-activity relationships. , 1980, Journal of medicinal chemistry.

[32]  Y Vander Heyden,et al.  Review on modelling aspects in reversed-phase liquid chromatographic quantitative structure-retention relationships. , 2007, Analytica chimica acta.

[33]  Bahram Hemmateenejad,et al.  Correlation ranking procedure for factor selection in PC-ANN modeling and application to ADMETox evaluation , 2005 .

[34]  Bernard P. A. Grandjean,et al.  Integrated Genetic Algorithm−Artificial Neural Network Strategy for Modeling Important Multiphase-Flow Characteristics , 2002 .

[35]  Maykel Pérez González,et al.  Quantitative structure-activity relationship to predict differential inhibition of aldose reductase by flavonoid compounds. , 2005, Bioorganic & medicinal chemistry.

[36]  Matthew Clark,et al.  The Probability of Chance Correlation Using Partial Least Squares (PLS) , 1993 .

[37]  Kunal Roy,et al.  Exploring QSAR with E-state index: selectivity requirements for COX-2 versus COX-1 binding of terphenyl methyl sulfones and sulfonamides. , 2004, Bioorganic & medicinal chemistry letters.

[38]  Paola Gramatica,et al.  Structure/Response Correlations and Similarity/Diversity Analysis by GETAWAY Descriptors, 2. Application of the Novel 3D Molecular Descriptors to QSAR/QSPR Studies , 2002, J. Chem. Inf. Comput. Sci..

[39]  Julio Caballero,et al.  QSAR modeling of matrix metalloproteinase inhibition by N-hydroxy-alpha-phenylsulfonylacetamide derivatives. , 2007, Bioorganic & medicinal chemistry.

[40]  Jagath J. Kaluarachchi,et al.  Application of artificial neural network and genetic algorithm in flow and transport simulations , 1998 .

[41]  Yong Shen,et al.  QSAR Studies and Molecular Design of Phenanthrene‐based Tylophorine Derivatives with Anticancer Activity , 2008 .

[42]  Sakae Kawato,et al.  Genetic-algorithm-based method to optimize spatial profile utilizing characteristics of electrostatic actuator deformable mirror , 2008 .

[43]  John C Dearden,et al.  Guidelines for developing and using quantitative structure‐activity relationships , 2003, Environmental toxicology and chemistry.

[44]  Maykel Pérez González,et al.  Variable selection methods in QSAR: an overview. , 2008, Current topics in medicinal chemistry.

[45]  Gerta Rücker,et al.  y-Randomization and Its Variants in QSPR/QSAR , 2007, J. Chem. Inf. Model..

[46]  Zhi Dang,et al.  QSPR STUDY ON DIRECT PHOTOLYSIS HALF-LIVES OF PAHs IN WATER SURFACE , 2005 .

[47]  Igor V. Tetko,et al.  Prediction of n-Octanol/Water Partition Coefficients from PHYSPROP Database Using Artificial Neural Networks and E-State Indices , 2001, J. Chem. Inf. Comput. Sci..

[48]  Yong Shen,et al.  DFT-based QSAR study and molecular design of AHMA derivatives as potent anticancer agents , 2007 .

[49]  Chris L. Waller,et al.  Development and Validation of a Novel Variable Selection Technique with Application to Multidimensional Quantitative Structure-Activity Relationship Studies , 1999, J. Chem. Inf. Comput. Sci..

[50]  A K Saxena,et al.  Comparison of MLR, PLS and GA-MLR in QSAR analysis* , 2003, SAR and QSAR in environmental research.

[51]  Alexander Golbraikh,et al.  A Novel Automated Lazy Learning QSAR (ALL-QSAR) Approach: Method Development, Applications, and Virtual Screening of Chemical Databases Using Validated ALL-QSAR Models , 2006, J. Chem. Inf. Model..