Modified particle swarm optimization method for variable selection in QSAR/QSPR studies

The selection of the most relevant variables is an important step in the QSAR/QSPR modeling process. In this work we apply modified particle swarm optimization (MPSO) based on multiple linear regression (MLR) for selecting a small subset of descriptors that has significant contribution to the Gibbs energy of formation for a diverse set of organic compounds. Nonlinear relationships between selected molecular descriptors and Gibbs energy of formation are achieved by radial basis function neural network (RBF NN), adaptive neuro-fuzzy inference system (ANFIS), and support vector machine (SVM) methods. The MLR, RBF NN, ANFIS, and SVM squared correlation coefficients are 0.928, 0.946, 0.945, and 0.947, respectively. The obtained results suggest that the proposed MPSO is an efficient and powerful method for feature selection (descriptor selection) in the QSAR/QSPR studies.

[1]  H. Modarress,et al.  Quantitative Structure–Property Relationship for Flash Points of Alcohols , 2011 .

[2]  H. Zhai,et al.  Prediction of association constants of cesium chelates based on Uniform Design Optimized Support Vector Machine , 2011 .

[3]  Bahram Hemmateenejad,et al.  Combination of Ant Colony Optimization with Various Local Search Strategies. A Novel Method for Variable Selection in Multivariate Calibration and QSPR Study , 2009 .

[4]  Long Jiao,et al.  QSPR studies on the aqueous solubility of PCDD/Fs by using artificial neural network combined with stepwise regression , 2010 .

[5]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[6]  Tomas Öberg,et al.  Extension of a prediction model to estimate vapor pressures of perfluorinated compounds (PFCs) , 2011 .

[7]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[8]  Hugo Kubinyi,et al.  Evolutionary variable selection in regression and PLS analyses , 1996 .

[9]  Eduardo A. Castro,et al.  Application of a novel ranking approach in QSPR-QSAR , 2008 .

[10]  Bahram Hemmateenejad,et al.  An efficient variable selection method based on the use of external memory in ant colony optimization. Application to QSAR/QSPR studies. , 2009, Analytica chimica acta.

[11]  Hiroyuki Watanabe,et al.  Application of a fuzzy discrimination analysis for diagnosis of valvular heart disease , 1994, IEEE Trans. Fuzzy Syst..

[12]  Radka Svobodová Vareková,et al.  Predicting pKa Values of Substituted Phenols from Atomic Charges: Comparison of Different Quantum Mechanical Methods and Charge Distribution Schemes , 2011, J. Chem. Inf. Model..

[13]  Guo-Li Shen,et al.  Modified particle swarm optimization algorithm for variable selection in MLR and PLS modeling: QSAR studies of antagonism of angiotensin II antagonists. , 2004, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[14]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[15]  Babak Rezaee,et al.  Application of adaptive neuro-fuzzy inference system for solubility prediction of carbon dioxide in polymers , 2009, Expert Syst. Appl..

[16]  M. Frenkel,et al.  Predictive correlations based on large experimental datasets: Critical constants for pure compounds , 2010 .

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Taher Niknam,et al.  A novel Multi‐objective Fuzzy Adaptive Chaotic PSO algorithm for Optimal Operation Management of distribution network with regard to fuel cell power plants , 2011 .

[19]  Roberto Todeschini,et al.  Molecular descriptors for chemoinformatics , 2009 .

[20]  H. Modarress,et al.  Quantitative structure–property relationship for surface tension of some common alcohols , 2011 .

[21]  Ting Chen,et al.  Ensemble Feature Selection: Consistent Descriptor Subsets for Multiple QSAR Models , 2007, J. Chem. Inf. Model..

[22]  R. Yager,et al.  Approximate Clustering Via the Mountain Method , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[23]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[24]  Ralf Herbrich,et al.  Learning Kernel Classifiers , 2001 .

[25]  Anton J. Hopfinger,et al.  Application of Genetic Function Approximation to Quantitative Structure-Activity Relationships and Quantitative Structure-Property Relationships , 1994, J. Chem. Inf. Comput. Sci..

[26]  John C Dearden,et al.  QSPR model of Henry's law constant for a diverse set of organic chemicals based on genetic algorithm-radial basis function network approach. , 2007, Chemosphere.

[27]  Jian-Hui Jiang,et al.  Modified Ant Colony Optimization Algorithm for Variable Selection in QSAR Modeling: QSAR Studies of Cyclooxygenase Inhibitors , 2005, J. Chem. Inf. Model..

[28]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[29]  Jyh-Shing Roger Jang,et al.  ANFIS: adaptive-network-based fuzzy inference system , 1993, IEEE Trans. Syst. Man Cybern..

[30]  Wen‐Jun Zhang,et al.  Comparison of different methods for variable selection , 2001 .

[31]  Michio Sugeno,et al.  Industrial Applications of Fuzzy Control , 1985 .

[32]  H. Golmohammadi,et al.  Quantitative structure–property relationship studies of gas-to-wet butyl acetate partition coefficient of some organic compounds using genetic algorithm and artificial neural network , 2010 .

[33]  R. Boggia,et al.  Genetic algorithms as a strategy for feature selection , 1992 .

[34]  H. Modarress,et al.  QSPR prediction of flash point of esters by means of GFA and ANFIS. , 2010, Journal of hazardous materials.

[35]  Aixia Yan Modeling of Gibbs Energy of Formation of Organic Compounds by Linear and Nonlinear Methods , 2006, J. Chem. Inf. Model..

[36]  Maurice Clerc,et al.  The particle swarm - explosion, stability, and convergence in a multidimensional complex space , 2002, IEEE Trans. Evol. Comput..

[37]  Peter C. Jurs,et al.  Automated Descriptor Selection for Quantitative Structure-Activity Relationships Using Generalized Simulated Annealing , 1995, J. Chem. Inf. Comput. Sci..

[38]  H. Modarress,et al.  Quantitative structure–property relationship prediction of liquid thermal conductivity for some alcohols , 2011 .

[39]  Aboozar Khajeh,et al.  Prediction of solubility of gases in polystyrene by Adaptive Neuro-Fuzzy Inference System and Radial Basis Function Neural Network , 2010, Expert Syst. Appl..

[40]  R. Todeschini,et al.  Molecular Descriptors for Chemoinformatics: Volume I: Alphabetical Listing / Volume II: Appendices, References , 2009 .

[41]  Maykel Pérez González,et al.  A new search algorithm for QSPR/QSAR theories: Normal boiling points of some organic molecules , 2005 .

[42]  Taher Niknam,et al.  A practical algorithm for optimal operation management of distribution network including fuel cell power plants , 2010 .

[43]  G. Járvás,et al.  Estimation of Hansen solubility parameters using multivariate nonlinear QSPR modeling with COSMO scr , 2011 .

[44]  Georgios Dounias,et al.  A hybrid particle swarm optimization algorithm for the vehicle routing problem , 2010, Eng. Appl. Artif. Intell..

[45]  Diffusion coefficient prediction of acids in water at infinite dilution by QSPR method , 2012, Structural Chemistry.

[46]  Brian T. Luke,et al.  Evolutionary Programming Applied to the Development of Quantitative Structure-Activity Relationships and Quantitative Structure-Property Relationships , 1994, J. Chem. Inf. Comput. Sci..

[47]  Stephen L. Chiu,et al.  Fuzzy Model Identification Based on Cluster Estimation , 1994, J. Intell. Fuzzy Syst..

[48]  Paul S. Andrews,et al.  An Investigation into Mutation Operators for Particle Swarm Optimization , 2006, 2006 IEEE International Conference on Evolutionary Computation.