Aggregation and minimax optimality in high-dimensional estimation

Aggregation is a popular technique in statistics and machine learning. Given a collection of estimators, the problem of linear, convex or model selection type aggregation consists in constructing a new estimator, called the aggregate, which is nearly as good as the best among them (or nearly as good as their best linear or convex combination), with respect to a given risk criterion. When the underlying model is sparse, which means that it is well approximated by a linear combination of a small number of functions in the dictionary, aggregation techniques turn out to be very useful in taking advantage of sparsity. On the other hand, aggregation is a general way of constructing adaptive nonparametric estimators, which is more powerful than the classical methods since it allows one to combine estimators of different nature. Aggregates are usually constructed by mixing the initial estimators or functions of the dictionary with data-dependent weights that can be defined is several possible ways. An important example is given by aggregates with exponential weights. They satisfy sharp oracle inequalities that allow one to treat in a unified way three different problems: Adaptive nonparametric estimation, aggregation and sparse estimation. Mathematics Subject Classification (2010). Primary 62G05; Secondary 62J07.

[1]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[2]  I. Johnstone,et al.  Maximum Entropy and the Nearly Black Object , 1992 .

[3]  I. Johnstone,et al.  Minimax Risk over l p-Balls for l q-error , 1994 .

[4]  Arkadi Nemirovski,et al.  Topics in Non-Parametric Statistics , 2000 .

[5]  P. Massart,et al.  Gaussian model selection , 2001 .

[6]  Alexandre B. Tsybakov,et al.  Optimal Rates of Aggregation , 2003, COLT.

[7]  Olivier Catoni,et al.  Statistical learning theory and stochastic optimization , 2004 .

[8]  Yuhong Yang Aggregating regression procedures to improve performance , 2004 .

[9]  I. Johnstone,et al.  Adapting to unknown sparsity by controlling the false discovery rate , 2005, math/0505374.

[10]  A. Tsybakov,et al.  Linear and convex aggregation of density estimators , 2006, math/0605292.

[11]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[12]  Andrew R. Barron,et al.  Information Theory and Mixing Least-Squares Regressions , 2006, IEEE Transactions on Information Theory.

[13]  Arnak S. Dalalyan,et al.  Aggregation by Exponential Weighting and Sharp Oracle Inequalities , 2007, COLT.

[14]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[15]  Karim Lounici Generalized mirror averaging and D-convex aggregation , 2007 .

[16]  Arnak S. Dalalyan,et al.  Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity , 2008, Machine Learning.

[17]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[18]  Arnak S. Dalalyan,et al.  Sparse Regression Learning by Aggregation and Langevin Monte-Carlo , 2009, COLT.

[19]  Philippe Rigollet,et al.  Kullback-Leibler aggregation and misspecified generalized linear models , 2009, 0911.2919.

[20]  Junzhou Huang,et al.  The Benefit of Group Sparsity , 2009 .

[21]  Felix Abramovich,et al.  MAP model selection in Gaussian regression , 2009, 0912.4387.

[22]  V. Koltchinskii,et al.  Nuclear norm penalization and optimal rates for noisy low rank matrix completion , 2010, 1011.6256.

[23]  A. Tsybakov,et al.  Exponential Screening and optimal rates of sparse estimation , 2010, 1003.2654.

[24]  N. Verzelen Minimax risks for sparse regressions: Ultra-high-dimensional phenomenons , 2010, 1008.0526.

[25]  S. Geer,et al.  Oracle Inequalities and Optimal Inference under Group Sparsity , 2010, 1007.1771.

[26]  Cun-Hui Zhang,et al.  Rate Minimaxity of the Lasso and Dantzig Selector for the lq Loss in lr Balls , 2010, J. Mach. Learn. Res..

[27]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[28]  Sébastien Gerchinovitz Prediction of individual sequences and prediction in the statistical framework : some links around sparse regression and aggregation techniques , 2011 .

[29]  A. Dalalyan,et al.  Sharp Oracle Inequalities for Aggregation of Affine Estimators , 2011, 1104.3969.

[30]  Karim Lounici,et al.  Pac-Bayesian Bounds for Sparse Regression Estimation with Exponential Weights , 2010, 1009.2707.

[31]  Sandra Paterlini,et al.  Adaptive Minimax Estimation over Sparse l q-Hulls , 2011, 1108.1961.

[32]  A. Tsybakov,et al.  Sparse Estimation by Exponential Weighting , 2011, 1108.5116.

[33]  Tong Zhang,et al.  Deviation Optimal Learning using Greedy Q-aggregation , 2012, ArXiv.

[34]  Arnak S. Dalalyan,et al.  Mirror averaging with sparsity priors , 2010, 1003.1189.

[35]  A. Tsybakov AGGREGATION AND HIGH-DIMENSIONAL STATISTICS (preliminary notes of Saint-Flour lectures, July 8-20, 2013) , 2013 .

[36]  Tong Zhang,et al.  Aggregation of Affine Estimators , 2013, ArXiv.

[37]  P. Rigollet,et al.  Optimal learning with Q-aggregation , 2013, 1301.6080.