Sparse algorithms are not stable: A no-free-lunch theorem

We consider two widely used notions in machine learning, namely: sparsity and algorithmic stability. Both notions are deemed desirable in designing algorithms, and are believed to lead to good generalization ability. In this paper, we show that these two notions contradict each other. That is, a sparse algorithm can not be stable and vice versa. Thus, one has to tradeoff sparsity and stability in designing a learning algorithm. In particular, our general result implies that lscr1-regularized regression (Lasso) cannot be stable, while lscr2-regularized regression is known to have strong stability properties.

[1]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[2]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[3]  Massimiliano Pontil,et al.  Leave One Out Error, Stability, and Generalization of Voting Combinations of Classifiers , 2004, Machine Learning.

[4]  Alexandre d'Aspremont,et al.  Full regularization path for sparse principal component analysis , 2007, ICML '07.

[5]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[6]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[7]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[8]  Ohad Shamir,et al.  Learnability and Stability in the General Learning Setting , 2009, COLT.

[9]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[10]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[11]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[12]  Ronald R. Coifman,et al.  Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[13]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[14]  Sayan Mukherjee,et al.  Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..

[15]  Olvi L. Mangasarian,et al.  Generalized Support Vector Machines , 1998 .

[16]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[17]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[18]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[19]  T. Poggio,et al.  General conditions for predictivity in learning theory , 2004, Nature.

[20]  Ingo Steinwart,et al.  Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[21]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[22]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[23]  T. Poggio,et al.  Sufficient Conditions for Uniform Stability of Regularization Algorithms , 2009 .

[24]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[25]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..