Variable selection methods : an introduction

In order to develop regression/classification models, QSAR analysis typically uses molecular descriptors as independent variables. The number of molecular descriptors has hugely increased over time and nowadays thousands of descriptors, able to describe different aspects of a molecule, can be calculated by means of dedicated software. However, when modelling a particular property or biological activity, it is reasonable to assume that only a small number of descriptors is actually correlated to the experimental response and is, therefore, relevant for building the mathematical model of interest.

[1]  Guo-Li Shen,et al.  Modified particle swarm optimization algorithm for variable selection in MLR and PLS modeling: QSAR studies of antagonism of angiotensin II antagonists. , 2004, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[2]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[3]  R. Boggia,et al.  Genetic algorithms as a strategy for feature selection , 1992 .

[4]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[5]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[8]  Marco Dorigo,et al.  Ant colony optimization theory: A survey , 2005, Theor. Comput. Sci..

[9]  Richard Jensen,et al.  Ant colony optimization as a feature selection method in the QSAR modeling of anti-HIV-1 activities of 3-(3,5-dimethylbenzyl)uracil derivatives using MLR, PLS and SVM regressions , 2009 .

[10]  Sung-Bae Cho,et al.  A Comprehensive Overview of the Applications of Artificial Life , 2006, Artificial Life.

[11]  H. Akaike Fitting autoregressive models for prediction , 1969 .

[12]  Riccardo Leardi,et al.  Nature-Inspired Methods in Chemometrics: Genetic Algorithms and Artificial Neural Networks , 2005 .

[13]  Harald Martens,et al.  A Partial Least Squares based algorithm for parsimonious variable selection , 2011, Algorithms for Molecular Biology.

[14]  C. Jun,et al.  Performance of some variable selection methods when multicollinearity is present , 2005 .

[15]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[16]  T. Næs,et al.  Canonical partial least squares—a unified PLS approach to classification and regression problems , 2009 .