Optimisation of structure representation for QSAR studies

Abstract Optimisation of a spectrum-like structure representation via genetic algorithm (GA) is described. The final optimised structure representation of 28 molecules (flavonoid derivatives, inhibitors of the enzyme p56 lck protein tyrosine kinase) contains only 15 variables compared with the 120 ones of the initial spectrum-like representation. The fitness function in the variable reduction of the GA procedure were counterpropagation artificial neural network (ANN) models. Using one chromosome after another as a code for new representation, a new ANN model was trained and tested for each of them. The correlation coefficient r between the experimental biological activity and the value predicted by the ANN model for the test set of 14 compounds (not used in the training) was estimated. The obtained correlation coefficient r is used as the final fitness criterion in the selection and reproduction ability of the genetic procedure for generation of the new population. Due to the fact that the spectrum-like structure representation is reversible, each representation's variable can be back-traced to the structural feature. The consequence is that 15 variables selected by the GA optimisation can pinpoint the most relevant spatial directions (with the respect to the skeleton) most responsible for the biological activities of the entire series of the compounds.

[1]  J. T. Clerc,et al.  Strukturaufklärung organischer Verbindungen durch computerunterstützten Vergleich spektraler Daten , 1972 .

[2]  R. Geahlen,et al.  Synthesis and protein-tyrosine kinase inhibitory activities of flavonoid analogues. , 1991, Journal of medicinal chemistry.

[3]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[4]  D. Massart Chemometrics: A Textbook , 1988 .

[5]  Teuvo Kohonen,et al.  An introduction to neural computing , 1988, Neural Networks.

[6]  Jure Zupan,et al.  Kohonen and counterpropagation artificial neural networks in analytical chemistry , 1997 .

[7]  A. Kraker,et al.  Synthesis and biochemical evaluation of a series of aminoflavones as potential inhibitors of protein-tyrosine kinases p56lck, EGFr, and p60v-src. , 1994, Journal of medicinal chemistry.

[8]  D. B. Hibbert Genetic algorithms in chemistry , 1993 .

[9]  Jure Zupan,et al.  General type of a uniform and reversible representation of chemical structures , 1997 .

[10]  Marjana Novic,et al.  Quantitative Structure-Activity Relationship of Flavonoid p56lck Protein Tyrosine Kinase Inhibitors. A Neural Network Approach , 1997, J. Chem. Inf. Comput. Sci..

[11]  Judith E. Dayhoff,et al.  Neural Network Architectures: An Introduction , 1989 .

[12]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[13]  R. Geahlen,et al.  Synthesis and evaluation of hydroxylated flavones and related compounds as potential inhibitors of the protein-tyrosine kinase p56lck. , 1991, Journal of natural products.