论文信息 - Sparse Algorithms Are Not Stable: A No-Free-Lunch Theorem - 字舞流文

Sparse Algorithms Are Not Stable: A No-Free-Lunch Theorem

We consider two desired properties of learning algorithms: sparsity and algorithmic stability. Both properties are believed to lead to good generalization ability. We show that these two properties are fundamentally at odds with each other: A sparse algorithm cannot be stable and vice versa. Thus, one has to trade off sparsity and stability in designing a learning algorithm. In particular, our general result implies that ℓ1-regularized regression (Lasso) cannot be stable, while ℓ2-regularized regression is known to have strong stability properties and is therefore not sparse.

Shie Mannor | Constantine Caramanis | Huan Xu | Shie Mannor | C. Caramanis | Huan Xu

[1] Geoffrey E. Hinton. Reducing the Dimensionality of Data with Neural , 2008 .

[2] T. Poggio,et al. General conditions for predictivity in learning theory , 2004, Nature.

[3] Ohad Shamir,et al. Learnability and Stability in the General Learning Setting , 2009, COLT.

[4] Ronald R. Coifman,et al. Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[5] Simon Haykin,et al. Generalized support vector machines , 1999, ESANN.

[6] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[7] R. Tibshirani,et al. Least angle regression , 2004, math/0406456.

[8] Emmanuel J. Candès,et al. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[9] Ingo Steinwart,et al. Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[10] T. Poggio,et al. Sufficient Conditions for Uniform Stability of Regularization Algorithms , 2009 .

[11] Robert Tibshirani,et al. 1-norm Support Vector Machines , 2003, NIPS.

[12] Stéphane Mallat,et al. Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[13] Tommi S. Jaakkola,et al. Maximum Entropy Discrimination , 1999, NIPS.

[14] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .

[15] Alexandre d'Aspremont,et al. Full regularization path for sparse principal component analysis , 2007, ICML '07.

[16] Bernhard Schölkopf,et al. Generalized Support Vector Machines , 2000 .

[17] Sayan Mukherjee,et al. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..

[18] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..

[19] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[20] Terence Tao,et al. The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[21] Balas K. Natarajan,et al. Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[22] Alexander J. Smola,et al. Learning with kernels , 1998 .

[23] Federico Girosi,et al. An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[24] Michael I. Jordan,et al. A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[25] Dean P. Foster,et al. The risk inflation criterion for multiple regression , 1994 .

[26] Massimiliano Pontil,et al. Leave One Out Error, Stability, and Generalization of Voting Combinations of Classifiers , 2004, Machine Learning.

[27] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.