Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria

Abstract This paper provides a new approach to feature selection based on the concept of feature filters, so that feature selection is independent of the prediction model. Data fitting is stated as a single-objective optimization problem, where the objective function indicates the error of approximating the target vector as some function of given features. Linear dependence between features induces the multicollinearity problem and leads to instability of the model and redundancy of the feature set. This paper introduces a feature selection method based on quadratic programming. This approach takes into account the mutual dependence of the features and the target vector, and selects features according to relevance and similarity measures defined according to the specific problem. The main idea is to minimize mutual dependence and maximize approximation quality by varying a binary vector that indicates the presence of features. The selected model is less redundant and more stable. To evaluate the quality of the proposed feature selection method and compare it with others, we use several criteria to measure instability and redundancy. In our experiments, we compare the proposed approach with several other feature selection methods, and show that the quadratic programming approach gives superior results according to the criteria considered for the test and real data sets.

[1]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[2]  Charles Elkan,et al.  Quadratic Programming Feature Selection , 2010, J. Mach. Learn. Res..

[3]  Ronald G. Askin Multicollinearity in regression: Review and examples , 1982 .

[4]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[5]  Sri Ramakrishna,et al.  FEATURE SELECTION METHODS AND ALGORITHMS , 2011 .

[6]  A. McQuarrie,et al.  Regression and Time Series Model Selection , 1998 .

[7]  David W. Aha,et al.  A Comparative Evaluation of Sequential Feature Selection Algorithms , 1995, AISTATS.

[8]  Steven G. Gilmour,et al.  The Interpretation of Mallows's CP‐Statistic , 1996 .

[9]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[10]  R. Leardi Genetic algorithms in chemometrics and chemistry: a review , 2001 .

[11]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[12]  Swagatam Das,et al.  Simultaneous feature selection and weighting - An evolutionary multi-objective optimization approach , 2015, Pattern Recognit. Lett..

[13]  Edward Leamer Multicollinearity: A Bayesian Interpretation , 1973 .

[14]  L. Ladha,et al.  FEATURE SELECTION METHODS AND ALGORITHMS , 2011 .

[15]  M. El-Dereny,et al.  Solving Multicollinearity Problem Using Ridge Regression Models , 2011 .

[16]  Stephen P. Boyd,et al.  Graph Implementations for Nonsmooth Convex Programs , 2008, Recent Advances in Learning and Control.

[17]  Jinsong Leng,et al.  A genetic Algorithm-Based feature selection , 2014 .

[18]  Yewon Kim,et al.  Solving Multicollinearity Problem Using Ridge Regression Models , 2015 .

[19]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[20]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Beat Pfister,et al.  A Semidefinite Programming Based Search Strategy for Feature Selection with Mutual Information Measure , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[23]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[24]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[25]  David A. Belsley,et al.  Regression Analysis and its Application: A Data-Oriented Approach.@@@Applied Linear Regression.@@@Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1981 .

[26]  Jon Atli Benediktsson,et al.  Feature Selection Based on Hybridization of Genetic Algorithm and Particle Swarm Optimization , 2015, IEEE Geoscience and Remote Sensing Letters.

[27]  Alexandr Katrutsa,et al.  Stress test procedure for feature selection algorithms , 2015 .