Subset Selection by Pareto Optimization

Selecting the optimal subset from a large set of variables is a fundamental problem in various learning tasks such as feature selection, sparse regression, dictionary learning, etc. In this paper, we propose the POSS approach which employs evolutionary Pareto optimization to find a small-sized subset with good performance. We prove that for sparse regression, POSS is able to achieve the best-so-far theoretically guaranteed approximation performance efficiently. Particularly, for the Exponential Decay subclass, POSS is proven to achieve an optimal solution. Empirical study verifies the theoretical results, and exhibits the superior performance of POSS to greedy and convex relaxation methods.

[1]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[2]  Tong Zhang,et al.  On the Consistency of Feature Selection using Greedy Least Squares Regression , 2009, J. Mach. Learn. Res..

[3]  D. A. Kenny,et al.  Statistics for the social and behavioral sciences , 1987 .

[4]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[5]  Michael Elad,et al.  Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.

[6]  David B. Dunson,et al.  Path Following and Empirical Bayes Model Selection for Sparse Regression , 2012, 1201.3528.

[7]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations , 2011, IEEE Transactions on Information Theory.

[8]  S. Muthukrishnan,et al.  Improved sparse approximation over quasiincoherent dictionaries , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[9]  Ivor W. Tsang,et al.  Matching Pursuit LASSO Part I: Sparse Recovery Over Big Dictionary , 2015, IEEE Transactions on Signal Processing.

[10]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[13]  Lin Xiao,et al.  A Proximal-Gradient Homotopy Method for the Sparse Least-Squares Problem , 2012, SIAM J. Optim..

[14]  Christos Boutsidis,et al.  An improved approximation algorithm for the column subset selection problem , 2008, SODA.

[15]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[16]  Yang Yu,et al.  On the usefulness of infeasible solutions in evolutionary search: A theoretical study , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[17]  Yang Yu,et al.  On Constrained Boolean Pareto Optimization , 2015, IJCAI.

[18]  S. Mallat,et al.  Adaptive greedy approximations , 1997 .

[19]  Yang Yu,et al.  Pareto Ensemble Pruning , 2015, AAAI.

[20]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[21]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[22]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[23]  Yang Yu,et al.  An analysis on recombination in multi-objective evolutionary optimization , 2013, Artif. Intell..

[24]  Abhimanyu Das,et al.  Algorithms for subset selection in linear regression , 2008, STOC.

[25]  S. Muthukrishnan,et al.  Approximation of functions over redundant dictionaries using coherence , 2003, SODA '03.

[26]  Alan J. Miller Subset Selection in Regression , 1992 .

[27]  Abhimanyu Das,et al.  Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection , 2011, ICML.

[28]  Xin Yao,et al.  On the approximation ability of evolutionary optimization with application to minimum set cover , 2010, Artif. Intell..