Optimal aggregation of affine estimators

We consider the problem of combining a (possibly uncountably infinite) set of affine estimators in non-parametric regression model with heteroscedastic Gaussian noise. Focusing on the exponentially weighted aggregate, we prove a PAC-Bayesian type inequality that leads to sharp oracle inequalities in discrete but also in continuous settings. The framework is general enough to cover the combinations of various procedures--such as the least square regression, the kernel ridge regression, the shrinkage estimators, etc.--used in the literature on statistical inverse problems. As a consequence, we show that the proposed aggregate provides an adaptive estimator in the exact minimax sense without neither discretizing the range of tuning parameters nor splitting the set of observations. We also illustrate numerically the good performance achieved by the exponentially weighted aggregate.

[1]  Arkadi Nemirovski,et al.  Topics in Non-Parametric Statistics , 2000 .

[2]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[3]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[4]  Joseph Salmon,et al.  NL-Means and aggregation procedures , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[5]  Jean-Yves Audibert Fast learning rates in statistical inference through aggregation , 2007, math/0703854.

[6]  Christophe Giraud,et al.  Mixing Least-Squares Estimators when the Variance is Unknown , 2007, 0711.0372.

[7]  Andrew R. Barron,et al.  Information Theory and Mixing Least-Squares Regressions , 2006, IEEE Transactions on Information Theory.

[8]  E. George Minimax Multiple Shrinkage Estimation , 1986 .

[9]  Jean-Michel Morel,et al.  A Review of Image Denoising Algorithms, with a New One , 2005, Multiscale Model. Simul..

[10]  Arnak S. Dalalyan,et al.  Aggregation by exponential weighting, sharp oracle inequalities and sparsity , 2008 .

[11]  A. Tsybakov,et al.  Oracle inequalities for inverse problems , 2002 .

[12]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[13]  Tong Zhang,et al.  Information-theoretic upper and lower bounds for statistical estimation , 2006, IEEE Transactions on Information Theory.

[14]  Mario Bertero,et al.  The Stability of Inverse Problems , 1980 .

[15]  Yuhong Yang Combining Different Procedures for Adaptive Regression , 2000, Journal of Multivariate Analysis.

[16]  Mehra System identification and time-series analysis , 1974 .

[17]  Francis R. Bach,et al.  Data-driven calibration of linear estimators with minimal penalties , 2009, NIPS.

[18]  Sam Efromovich,et al.  SHARP-OPTIMAL AND ADAPTIVE ESTIMATION FOR HETEROSCEDASTIC NONPARAMETRIC REGRESSION , 1996 .

[19]  G. Lecu'e Optimal rates of aggregation in classification under low noise assumption , 2006, math/0603447.

[20]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[21]  L. Cavalier Nonparametric statistical inverse problems , 2008 .

[22]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[23]  Arnak S. Dalalyan,et al.  Sparse Regression Learning by Aggregation and Langevin Monte-Carlo , 2009, COLT.

[24]  A. Tsybakov,et al.  Linear and convex aggregation of density estimators , 2006, math/0605292.

[25]  Yuhong Yang REGRESSION WITH MULTIPLE CANDIDATE MODELS: SELECTING OR MIXING? , 1999 .

[26]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[27]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[28]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[29]  Sam Efromovich,et al.  On nonparametric regression for IID observations in a general setting , 1996 .

[30]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[31]  Karim Lounici,et al.  Pac-Bayesian Bounds for Sparse Regression Estimation with Exponential Weights , 2010, 1009.2707.

[32]  Alexandre B. Tsybakov,et al.  Optimal Rates of Aggregation , 2003, COLT.

[33]  Arnak S. Dalalyan,et al.  Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity , 2008, Machine Learning.

[34]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[35]  Karim Lounici Generalized mirror averaging and D-convex aggregation , 2007 .

[36]  H. Akaike A new look at the statistical model identification , 1974 .

[37]  Yu. Golubev On universal oracle inequalities related to high-dimensional linear models , 2010, 1011.2378.

[38]  Yuhong Yang Aggregating regression procedures to improve performance , 2004 .

[39]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[40]  Olivier Catoni,et al.  Statistical learning theory and stochastic optimization , 2004 .

[41]  A. Tsybakov,et al.  Exponential Screening and optimal rates of sparse estimation , 2010, 1003.2654.

[42]  A. Juditsky,et al.  Nonparametric Denoising of Signals with Unknown Local Structure, I: Oracle Inequalities , 2008, 0809.0814.

[43]  Arthur Cohen,et al.  All Admissible Linear Estimates of the Mean Vector , 1966 .

[44]  A. Dalalyan,et al.  Sharp Oracle Inequalities for Aggregation of Affine Estimators , 2011, 1104.3969.

[45]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[46]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[47]  Arnak S. Dalalyan,et al.  Aggregation by Exponential Weighting and Sharp Oracle Inequalities , 2007, COLT.

[48]  A. Juditsky,et al.  Functional aggregation for nonparametric regression , 2000 .

[49]  Sylvie Huet,et al.  Estimator selection in the Gaussian setting , 2010, 1007.2096.

[50]  T. Cai Adaptive wavelet estimation : A block thresholding and oracle inequality approach , 1999 .

[51]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[52]  J. Coyle Inverse Problems , 2004 .

[53]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .