Selecting Diverse Features via Spectral Regularization

We study the problem of diverse feature selection in linear regression: selecting a small subset of diverse features that can predict a given objective. Diversity is useful for several reasons such as interpretability, robustness to noise, etc. We propose several spectral regularizers that capture a notion of diversity of features and show that these are all submodular set functions. These regularizers, when added to the objective function for linear regression, result in approximately submodular functions, which can then be maximized by efficient greedy and local search algorithms, with provable guarantees. We compare our algorithms to traditional greedy and l1-regularization schemes and show that we obtain a more diverse set of features that result in the regression problem being stable under perturbations.

[1]  Volkan Cevher,et al.  Combinatorial selection and least absolute shrinkage via the Clash algorithm , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[2]  Huan Liu,et al.  Redundancy based feature selection for microarray data , 2004, KDD.

[3]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[4]  S. Muthukrishnan,et al.  Approximation of functions over redundant dictionaries using coherence , 2003, SODA '03.

[5]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[6]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[7]  S. Friedland,et al.  Submodular spectral functions of principal submatrices of a hermitian matrix, extensions and applications , 2010, 1007.3478.

[8]  G. Diekhoff,et al.  Basic statistics for the social and behavioral sciences , 1996 .

[9]  D. Donoho For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[10]  Joseph K. Bradley,et al.  Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[11]  Andreas Krause,et al.  Near-optimal sensor placements in Gaussian processes , 2005, ICML.

[12]  Alan J. Miller Subset Selection in Regression , 1992 .

[13]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[14]  Maurice Queyranne,et al.  An Exact Algorithm for Maximum Entropy Sampling , 1995, Oper. Res..

[15]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[16]  Abhimanyu Das,et al.  Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection , 2011, ICML.

[17]  Francis R. Bach,et al.  Trace Lasso: a trace norm regularization for correlated designs , 2011, NIPS.

[18]  Aaron Roth,et al.  Constrained Non-monotone Submodular Maximization: Offline and Secretary Algorithms , 2010, WINE.

[19]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models , 2008, NIPS.

[20]  U. Feige,et al.  Maximizing Non-monotone Submodular Functions , 2011 .

[21]  Shuheng Zhou,et al.  Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation , 2009, NIPS.

[22]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[23]  Abhimanyu Das,et al.  Subset selection algorithms for prediction , 2011 .

[24]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[25]  D. A. Kenny,et al.  Statistics for the social and behavioral sciences , 1987 .