Quantitative Structure-Electrochemistry Relationship Study of Some Organic Compounds Using PC-ANN and PCR

Motivation. A QSPR analysis has been conducted on the half-wave reduction potential (E1/2) of a diverse set of organic compounds by means of principal component regression (PCR) and principal component-artificial neural network (PC-ANN) modeling method. Genetic algorithm was employed as a factor selection procedure for both modeling methods. The results were compared with two other factor selection methods namely eigen-value ranking (EV) and correlation ranking (CR) procedures. Method. By using the Dragon software more than 1000 structural descriptors were calculated for each molecule. The descriptor data matrix was subjected to principal component analysis and the most significant principal components (PC) were extracted. Multiple linear regression and artificial neural network were employed for the respective linear and nonlinear modeling between the extracted principal components and E1/2. First, the principal components were ranked by decreasing eigen-values and entered successively to each modeling method separately. In addition, the factors were ranked by their corresponding correlation (linear correlation for PCR and nonlinear correlation for PC-ANN models) with the half-wave potentials and entered to the models. Finally, genetic algorithm (GA) was also employed to select the best set of factors for both models. Results. The 96% of variances in the descriptor data matrix could be explained by 30 first extracted PCs. Among these, 10, 6 and 10 PCs were selected by EV, CR and GA, respectively, for PCR , while for the ANN model, 7 PCs were selected by all of the factor selection procedures. The ANN model with EV, CR and GA factor selection procedures could explain 78.4%, 94.3% and 96% of variances in the E1/2 data, respectively. While, the respective values obtained from different PCR procedures were 52.9%, 58.2% and 74.4%. Conclusions. The results of this project showed that factor selection by correlation ranking and genetic algorithm gives superior results relative to those obtained by eigen value ranking. This confirms that the magnitude of the eigen value of a PC is not necessarily a measure of its significance in calibration. Moreover, it was found that for PCR method, the results obtained by GA has a major difference with those by EV and CR procedures, while, the GA and CR factor selection methods give results close to each other.

[1]  C. B. Lucasius,et al.  Genetic algorithms in wavelength selection: a comparative study , 1994 .

[2]  Xueguang Shao,et al.  Molecular interactions of α-cyclodextrin inclusion complexes using a genetic algorithm , 2001 .

[3]  J. Brian Gray,et al.  Introduction to Linear Regression Analysis , 2002, Technometrics.

[4]  G. Puchwein Selection of calibration samples for near-infrared spectrometry by factor analysis of spectra , 1988 .

[5]  H. Kubinyi QSAR and 3D QSAR in drug design Part 1: methodology , 1997 .

[6]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[7]  Bahram Hemmateenejad,et al.  Genetic Algorithm Applied to the Selection of Factors in Principal Component-Artificial Neural Networks: Application to QSAR Study of Calcium Channel Antagonist Activity of 1, 4-Dihydropyridines (Nifedipine Analogous) , 2003, J. Chem. Inf. Comput. Sci..

[8]  J. Gasteiger,et al.  The comparison of molecular surfaces by neural networks and its applications to quantitative structure activity studies , 1998 .

[9]  M. Shamsipur,et al.  Simultaneous determination of promethazine, chlorpromazine, and perphenazine by multivariate calibration methods and derivative spectrophotometry. , 2002, Journal of AOAC International.

[10]  Jorge Gálvez,et al.  Charge Indexes. New Topological Descriptors , 1994, J. Chem. Inf. Comput. Sci..

[11]  Paul J. Gemperline,et al.  Nonlinear multivariate calibration using principal components regression and artificial neural networks , 1991 .

[12]  Structure-property relationship between half-wave potentials of organic compounds and their topology , 1995 .

[13]  S. Unger Molecular Connectivity in Structure–activity Analysis , 1987 .

[14]  Adam P. Hitchcock,et al.  Quantitative Mapping of Structured Polymeric Systems Using Singular Value Decomposition Analysis of Soft X-ray Images , 2002 .

[15]  Glenn J. Myatt,et al.  Exploring Functional Group Transformations on CASREACT , 1997, J. Chem. Inf. Comput. Sci..

[16]  R L McNaughton,et al.  Electronic structure studies of oxomolybdenum tetrathiolate complexes: origin of reduction potential differences and relationship to cysteine-molybdenum bonding in sulfite oxidase. , 2000, Inorganic chemistry.

[17]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[18]  Jianguo Sun,et al.  A correlation principal component regression analysis of NIR data , 1995 .

[19]  Bahram Hemmateenejad,et al.  QSAR study of the calcium channel antagonist activity of some recently synthesized dihydropyridine derivatives. An application of genetic algorithm for variable selection in MLR and PLS methods , 2002 .

[20]  U Depczynski,et al.  Genetic algorithms applied to the selection of factors in principal component regression , 2000 .

[21]  Kathryn A. Dowsland,et al.  Genetic Algorithms-a Tool for OR? , 1996 .

[22]  G Schneider,et al.  Artificial neural networks for computer-based molecular design. , 1998, Progress in biophysics and molecular biology.

[23]  Charles Hagwood,et al.  Mathematical analysis of spectral orthogonality , 1993 .

[24]  Jeffrey A. Nichols,et al.  Combined Quantum Chemistry and Photoelectron Spectroscopy Study of the Electronic Structure and Reduction Potentials of Rubredoxin Redox Site Analogues , 2003 .

[25]  Gordon A Anderson,et al.  Use of artificial neural networks for the accurate prediction of peptide liquid chromatography elution times in proteome analyses. , 2003, Analytical chemistry.

[26]  P. Broto,et al.  Molecular structures: perception, autocorrelation descriptor and sar studies: system of atomic contributions for the calculation of the n-octanol/water partition coefficients , 1984 .

[27]  C. Hansch,et al.  Chem-bioinformatics and QSAR: a review of QSAR lacking positive hydrophobic terms. , 2001, Chemical reviews.

[28]  M. Shamsipur,et al.  Multicomponent acid–base titration by principal component-artificial neural network calibration , 2002 .

[29]  Didier Villemin,et al.  Neural Networks: Accurate Nonlinear QSAR Model for HEPT Derivatives , 2003, J. Chem. Inf. Comput. Sci..

[30]  Brian D. Hudson,et al.  A Consensus Neural Network-Based Technique for Discriminating Soluble and Poorly Soluble Compounds , 2003, J. Chem. Inf. Comput. Sci..

[31]  L. Tőke,et al.  Quantitative structure-electrochemistry relationships of α, β-unsaturated ketones , 1995 .

[32]  Paola Gramatica,et al.  QSAR Modeling of Bioconcentration Factor by theoretical molecular descriptors , 2003 .

[33]  Gerta Rücker,et al.  Counts of all walks as atomic and molecular descriptors , 1993, J. Chem. Inf. Comput. Sci..

[34]  Dimitra Hadjipavlou‐Litina,et al.  Review, reevaluation, and new results in quantitative structure‐activity studies of anticonvulsants , 1998, Medicinal research reviews.

[35]  Yu-Long Xie,et al.  Evaluation of principal component selection methods to form a global prediction model by principal component regression , 1997 .

[36]  Douglas N. Rutledge,et al.  GENETIC ALGORITHM APPLIED TO THE SELECTION OF PRINCIPAL COMPONENTS , 1998 .

[37]  Y. Takahata,et al.  Structure-Activity Relationship Studies of Carcinogenic Activity of Polycyclic Aromatic Hydrocarbons Using Calculated Molecular Descriptors with Principal Component Analysis and Neural Network Methods , 1999, J. Chem. Inf. Comput. Sci..

[38]  Bahram Hemmateenejad,et al.  Application of ab initio theory for the prediction of acidity constants of some 1-hydroxy-9,10-anthraquinone derivatives using genetic neural network , 2003 .

[39]  John H. Kalivas,et al.  Which principal components to utilize for principal component regression , 1992 .