Feature selection for structure-activity correlation using binary particle swarms.

We present a new feature selection algorithm for structure-activity and structure-property correlation based on particle swarms. Particle swarms explore the search space through a population of individuals that adapt by returning stochastically toward previously successful regions, influenced by the success of their neighbors. This method, which was originally intended for searching multidimensional continuous spaces, is adapted to the problem of feature selection by viewing the location vectors of the particles as probabilities and employing roulette wheel selection to construct candidate subsets. The algorithm is applied in the construction of parsimonious quantitative structure-activity relationship (QSAR) models based on feed-forward neural networks and is tested on three classical data sets from the QSAR literature. It is shown that the method compares favorably with simulated annealing and is able to identify a better and more diverse set of solutions given the same amount of simulation time.

[1]  Calyampudi R. Rao SOME PROBLEMS INVOLVING LINEAR HYPOTHESES IN MULTIVARIATE ANALYSIS , 1959 .

[2]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[3]  P. Churchill Calcium channel antagonists and renin release. , 1987, American journal of nephrology.

[4]  T. A. Andrea,et al.  Applications of neural networks in quantitative structure-activity relationships of dihydrofolate reductase inhibitors. , 1991, Journal of medicinal chemistry.

[5]  S. So,et al.  Application of neural networks: quantitative structure-activity relationships of the derivatives of 2,4-diamino-5-(substituted-benzyl)pyrimidines as DHFR inhibitors. , 1992, Journal of medicinal chemistry.

[6]  James H. Wikel,et al.  The use of neural networks for variable selection in QSAR , 1993 .

[7]  W. G. Richards,et al.  Application of Neural Networks: Quantitative Structure-Activity Relationships of the Derivatives of 2,4-Diamino-5-(substituted-benzyl) pyrimidines as DHFR Inhibitors. , 1993 .

[8]  Anton J. Hopfinger,et al.  Application of Genetic Function Approximation to Quantitative Structure-Activity Relationships and Quantitative Structure-Property Relationships , 1994, J. Chem. Inf. Comput. Sci..

[9]  D. Manallack,et al.  Analysis of linear and nonlinear QSAR data using neural networks. , 1994, Journal of medicinal chemistry.

[10]  Brian T. Luke,et al.  Evolutionary Programming Applied to the Development of Quantitative Structure-Activity Relationships and Quantitative Structure-Property Relationships , 1994, J. Chem. Inf. Comput. Sci..

[11]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[12]  D. Maddalena,et al.  Prediction of receptor properties and binding affinity of ligands to benzodiazepine/GABAA receptors using artificial neural networks. , 1995, Journal of medicinal chemistry.

[13]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[14]  Han van de Waterbeemd,et al.  Chemometric methods in molecular design , 1995 .

[15]  Peter C. Jurs,et al.  Automated Descriptor Selection for Quantitative Structure-Activity Relationships Using Generalized Simulated Annealing , 1995, J. Chem. Inf. Comput. Sci..

[16]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[17]  James Devillers,et al.  Neural Networks in QSAR and Drug Design , 1996 .

[18]  M. Karplus,et al.  Genetic neural networks for quantitative structure-activity relationships: improvements and application of benzodiazepine affinity for benzodiazepine/GABAA receptors. , 1996, Journal of medicinal chemistry.

[19]  M Karplus,et al.  Evolutionary optimization in quantitative structure-activity relationship: an application of genetic neural networks. , 1996, Journal of medicinal chemistry.

[20]  James Kennedy,et al.  The particle swarm: social adaptation of knowledge , 1997, Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC '97).

[21]  Dimitris K. Agrafiotis,et al.  Stochastic Algorithms for Maximizing Molecular Diversity , 1997, J. Chem. Inf. Comput. Sci..

[22]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: GA-Based PLS Analysis of Calcium Channel Antagonists , 1997, J. Chem. Inf. Comput. Sci..

[23]  Russell C. Eberhart,et al.  Comparison between Genetic Algorithms and Particle Swarm Optimization , 1998, Evolutionary Programming.

[24]  Yue Shi,et al.  A modified particle swarm optimizer , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[25]  Russell C. Eberhart,et al.  Parameter Selection in Particle Swarm Optimization , 1998, Evolutionary Programming.

[26]  V. Rao Vemuri,et al.  Analysis of Speciation and Niching in the Multi-Niche Crowding GA , 1999, Theor. Comput. Sci..

[27]  D K Agrafiotis,et al.  Kolmogorov-Smirnov statistic and its application in library design. , 2000, Journal of molecular graphics & modelling.

[28]  Dimitris K. Agrafiotis,et al.  Nonlinear Mapping Networks , 2000, J. Chem. Inf. Comput. Sci..

[29]  Dimitris K. Agrafiotis,et al.  A Novel Method for Building Regression Tree Models for QSAR Based on Artificial Ant Colony Systems , 2001, J. Chem. Inf. Comput. Sci..

[30]  David Hartsough,et al.  Toward an Optimal Procedure for Variable Selection and QSAR Model Building , 2001, J. Chem. Inf. Comput. Sci..

[31]  D. Agrafiotis,et al.  Nonlinear mapping of massive data sets by fuzzy clustering and neural networks , 2001 .

[32]  D. Agrafiotis,et al.  Variable selection for QSAR by artificial ant colony systems , 2002, SAR and QSAR in environmental research.