Combining PLS with GA-GP for QSAR

Abstract In this paper, a new algorithm, the partial least squares (PLS) improved by genetic algorithm–genetic programming (GA-GP) is applied to deal with functions for inner relationship in quantitative structure–activity relationship (QSAR). PLS is used to build a linear or nonlinear model between the principal components and its activity, and GA-GP is applied to regressions and equations. It develops PLS models to increase the range of PLS modeling. Using the inner relationship of polynomial function in this paper, a set of 79 inhibitors of HIV-1 reverse transcriptase, derivatives of a recently reported HIV-1-specific lead: 1-[(2-hydroxyethoxy) methyl]-6-(phenylthio) thymine (HEPT) was studied. The obtained QSAR model shows high predictive ability, r cv =0.900. It demonstrates that this method is useful.

[1]  John R. Koza,et al.  Genetic programming: a paradigm for genetically breeding populations of computer programs to solve problems , 1990 .

[2]  Peisheng Cong,et al.  Combining nonlinear PLS with the numeric genetic algorithm for QSAR , 1999 .

[3]  Junmei Wang,et al.  Applications of genetic algorithms on the structure–activity correlation study of a group of non-nucleoside HIV-1 inhibitors , 1999 .

[4]  Juan M. Luco,et al.  QSAR Based on Multiple Linear Regression and PLS Methods for the Anti-HIV Activity of a Large Group of HEPT Derivatives , 1997, J. Chem. Inf. Comput. Sci..

[5]  R. Leardi,et al.  Genetic algorithms applied to feature selection in PLS regression: how and when to use them , 1998 .

[6]  John R. Koza,et al.  Genetic programming 2 - automatic discovery of reusable programs , 1994, Complex adaptive systems.

[7]  John R. Koza,et al.  Reverse Engineering of Metabolic Pathways from Observed Data Using Genetic Programming , 2000, Pacific Symposium on Biocomputing.

[8]  Hiroshi Yoshida,et al.  Optimization of the Inner Relation Function of QPLS Using Genetic Algorithm , 1997, J. Chem. Inf. Comput. Sci..

[9]  R. Cramer,et al.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.

[10]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: GA-Based Region Selection for CoMFA Modeling , 1998, J. Chem. Inf. Comput. Sci..

[11]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[12]  J. K. Kinnear,et al.  Advances in Genetic Programming , 1994 .

[13]  Zhou Xiang-Dong,et al.  Application of Genetic Programming Coupling with Genetic Algorithm , 1999 .

[14]  H Ichikawa,et al.  Neural networks applied to quantitative structure-activity relationship analysis. , 1990, Journal of medicinal chemistry.

[15]  Douglas N. Rutledge,et al.  GENETIC ALGORITHM APPLIED TO THE SELECTION OF PRINCIPAL COMPONENTS , 1998 .

[16]  S. Wold Nonlinear partial least squares modelling II. Spline inner relation , 1992 .

[17]  S. Wold,et al.  Nonlinear PLS modeling , 1989 .

[18]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: Application of GA-Based Region Selection to a 3D-QSAR Study of Acetylcholinesterase Inhibitors , 1999, J. Chem. Inf. Comput. Sci..

[19]  Frank R. Burden,et al.  Atomistic topological indices applied to benzodiazepines using various regression methods , 1998 .

[20]  Gerrit Kateman,et al.  Optimization of calibration data with the dynamic genetic algorithm , 1992 .

[21]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .