Sharp Oracle Inequalities for Aggregation of Affine Estimators

We consider the problem of combining a (possibly uncountably infinite) set of affine estimators in non-parametric regression model with heteroscedastic Gaussian noise. Focusing on the exponentially weighted aggregate, we prove a PAC-Bayesian type inequality that leads to sharp oracle inequalities in discrete but also in continuous settings. The framework is general enough to cover the combinations of various procedures such as least square regression, kernel ridge regression, shrinking estimators and many other estimators used in the literature on statistical inverse problems. As a consequence, we show that the proposed aggregate provides an adaptive estimator in the exact minimax sense without neither discretizing the range of tuning parameters nor splitting the set of observations. We also illustrate numerically the good performance achieved by the exponentially weighted aggregate.

[1]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[2]  E. George Combining Minimax Shrinkage Estimators , 1986 .

[3]  E. George Minimax Multiple Shrinkage Estimation , 1986 .

[4]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[5]  D. Donoho,et al.  Minimax Risk Over Hyperrectangles, and Implications , 1990 .

[6]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[7]  A. Kneip Ordered Linear Smoothers , 1994 .

[8]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[9]  Sam Efromovich,et al.  SHARP-OPTIMAL AND ADAPTIVE ESTIMATION FOR HETEROSCEDASTIC NONPARAMETRIC REGRESSION , 1996 .

[10]  Sam Efromovich,et al.  On nonparametric regression for IID observations in a general setting , 1996 .

[11]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[12]  J. Polzehl,et al.  Adaptive weights smoothing with applications to image restoration , 1998 .

[13]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[14]  Yuhong Yang REGRESSION WITH MULTIPLE CANDIDATE MODELS: SELECTING OR MIXING? , 1999 .

[15]  Manfred K. Warmuth,et al.  Averaging Expert Predictions , 1999, EuroCOLT.

[16]  T. Cai Adaptive wavelet estimation : A block thresholding and oracle inequality approach , 1999 .

[17]  Arkadi Nemirovski,et al.  Topics in Non-Parametric Statistics , 2000 .

[18]  A. Juditsky,et al.  Functional aggregation for nonparametric regression , 2000 .

[19]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[20]  Yuhong Yang Combining Different Procedures for Adaptive Regression , 2000, Journal of Multivariate Analysis.

[21]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[22]  John Shawe-Taylor,et al.  PAC-Bayes & Margins , 2002, NIPS.

[23]  A. Tsybakov,et al.  Sharp adaptation for inverse problems with random noise , 2002 .

[24]  A. Tsybakov,et al.  Oracle inequalities for inverse problems , 2002 .

[25]  Alexandre B. Tsybakov,et al.  Optimal Rates of Aggregation , 2003, COLT.

[26]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[27]  Olivier Catoni,et al.  Statistical learning theory and stochastic optimization , 2004 .

[28]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[29]  Yuhong Yang Aggregating regression procedures to improve performance , 2004 .

[30]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[31]  Jean-Michel Morel,et al.  A Review of Image Denoising Algorithms, with a New One , 2005, Multiscale Model. Simul..

[32]  A. Tsybakov,et al.  Linear and convex aggregation of density estimators , 2006, math/0605292.

[33]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[34]  Yishay Mansour,et al.  Improved second-order bounds for prediction with expert advice , 2006, Machine Learning.

[35]  G. Lecu'e Optimal rates of aggregation in classification under low noise assumption , 2006, math/0603447.

[36]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[37]  Andrew R. Barron,et al.  Information Theory and Mixing Least-Squares Regressions , 2006, IEEE Transactions on Information Theory.

[38]  Arnak S. Dalalyan,et al.  Aggregation by Exponential Weighting and Sharp Oracle Inequalities , 2007, COLT.

[39]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[40]  Karim Lounici Generalized mirror averaging and D-convex aggregation , 2007 .

[41]  Jean-Yves Audibert,et al.  Progressive mixture rules are deviation suboptimal , 2007, NIPS.

[42]  Christophe Giraud,et al.  Mixing Least-Squares Estimators when the Variance is Unknown , 2007, 0711.0372.

[43]  Universal pointwise selection rule in multivariate function estimation , 2008, 0811.2649.

[44]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[45]  Arnak S. Dalalyan,et al.  Aggregation by exponential weighting, sharp oracle inequalities and sparsity , 2008 .

[46]  L. Cavalier Nonparametric statistical inverse problems , 2008 .

[47]  A. Juditsky,et al.  Nonparametric Denoising of Signals with Unknown Local Structure, I: Oracle Inequalities , 2008, 0809.0814.

[48]  Arnak S. Dalalyan,et al.  Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity , 2008, Machine Learning.

[49]  Francis R. Bach,et al.  Data-driven calibration of linear estimators with minimal penalties , 2009, NIPS.

[50]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[51]  Jean-Yves Audibert Fast learning rates in statistical inference through aggregation , 2007, math/0703854.

[52]  Arnak S. Dalalyan,et al.  Sparse Regression Learning by Aggregation and Langevin Monte-Carlo , 2009, COLT.

[53]  Philippe Rigollet,et al.  Kullback-Leibler aggregation and misspecified generalized linear models , 2009, 0911.2919.

[54]  Joseph Salmon,et al.  NL-Means and aggregation procedures , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[55]  Sylvie Huet,et al.  Estimator selection in the Gaussian setting , 2010, 1007.2096.

[56]  Yu. Golubev On universal oracle inequalities related to high-dimensional linear models , 2010, 1011.2378.

[57]  A. Tsybakov,et al.  Exponential Screening and optimal rates of sparse estimation , 2010, 1003.2654.

[58]  Arnak S. Dalalyan,et al.  Competing against the Best Nearest Neighbor Filter in Regression , 2011, ALT.

[59]  N. Hengartner,et al.  Recursive bias estimation for multivariate regression smoothers , 2011 .

[60]  Tong Zhang,et al.  Greedy Model Averaging , 2011, NIPS.

[61]  Karim Lounici,et al.  Pac-Bayesian Bounds for Sparse Regression Estimation with Exponential Weights , 2010, 1009.2707.

[62]  Arnak S. Dalalyan,et al.  Optimal aggregation of affine estimators , 2011, COLT.

[63]  Sandra Paterlini,et al.  Adaptive Minimax Estimation over Sparse l q-Hulls , 2011, 1108.1961.

[64]  Sébastien Gerchinovitz,et al.  Sparsity Regret Bounds for Individual Sequences in Online Linear Regression , 2011, COLT.

[65]  Stéphane Gaïffas,et al.  Hyper-Sparse Optimal Aggregation , 2009, J. Mach. Learn. Res..

[66]  A. Tsybakov,et al.  Sparse Estimation by Exponential Weighting , 2011, 1108.5116.

[67]  Tong Zhang,et al.  Deviation Optimal Learning using Greedy Q-aggregation , 2012, ArXiv.

[68]  Arnak S. Dalalyan,et al.  Mirror averaging with sparsity priors , 2010, 1003.1189.

[69]  S. Mendelson,et al.  On the optimality of the aggregate with exponential weights for low temperatures , 2013, 1303.5180.