On the Exponentially Weighted Aggregate with the Laplace Prior

In this paper, we study the statistical behaviour of the Exponentially Weighted Aggregate (EWA) in the problem of high-dimensional regression with fixed design. Under the assumption that the underlying regression vector is sparse, it is reasonable to use the Laplace distribution as a prior. The resulting estimator and, specifically, a particular instance of it referred to as the Bayesian lasso, was already used in the statistical literature because of its computational convenience, even though no thorough mathematical analysis of its statistical properties was carried out. The present work fills this gap by establishing sharp oracle inequalities for the EWA with the Laplace prior. These inequalities show that if the temperature parameter is small, the EWA with the Laplace prior satisfies the same type of oracle inequality as the lasso estimator does, as long as the quality of estimation is measured by the prediction loss. Extensions of the proposed methodology to the problem of prediction with low-rank matrices are considered.

[1]  H. Akaike A new look at the statistical model identification , 1974 .

[2]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[3]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[4]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[5]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[8]  Yuhong Yang Adaptive estimation in pattern recognition by combining different procedures , 1998 .

[9]  Yuhong Yang Mixing Strategies for Density Estimation , 2000 .

[10]  Colin L. Mallows,et al.  Some Comments on Cp , 2000, Technometrics.

[11]  Yuhong Yang Combining Different Procedures for Adaptive Regression , 2000, Journal of Multivariate Analysis.

[12]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[13]  Yuhong Yang Adaptive Regression by Mixing , 2001 .

[14]  Bhaskar D. Rao,et al.  Perspectives on Sparse Bayesian Learning , 2003, NIPS.

[15]  Olivier Catoni,et al.  Statistical learning theory and stochastic optimization , 2004 .

[16]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[17]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[18]  Adi Shraibman,et al.  Rank, Trace-Norm and Max-Norm , 2005, COLT.

[19]  G. Lecu'e,et al.  Adapting to Unknown Smoothness by Aggregation of Thresholded Wavelet Estimators , 2006, math/0612546.

[20]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[21]  Andrew R. Barron,et al.  Information Theory and Mixing Least-Squares Regressions , 2006, IEEE Transactions on Information Theory.

[22]  Arnak S. Dalalyan,et al.  Aggregation by Exponential Weighting and Sharp Oracle Inequalities , 2007, COLT.

[23]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[24]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[25]  G. Lecu'e,et al.  Optimal rates and adaptation in the single-index model using aggregation , 2007, math/0703706.

[26]  Yew Jin Lim Variational Bayesian Approach to Movie Rating Prediction , 2007 .

[27]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[28]  S. Pandey,et al.  What Are Degrees of Freedom , 2008 .

[29]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[30]  A. Juditsky,et al.  Learning by mirror averaging , 2005, math/0511468.

[31]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[32]  Arnak S. Dalalyan,et al.  Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity , 2008, Machine Learning.

[33]  Chris Hans Bayesian lasso regression , 2009 .

[34]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[35]  Jean-Yves Audibert Fast learning rates in statistical inference through aggregation , 2007, math/0703854.

[36]  V. Koltchinskii Sparse recovery in convex hulls via entropy penalization , 2009, 0905.2078.

[37]  Arnak S. Dalalyan,et al.  Sparse Regression Learning by Aggregation and Langevin Monte-Carlo , 2009, COLT.

[38]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[39]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[40]  S. Bobkov,et al.  Concentration of the information in data with log-concave distributions , 2010, 1012.5457.

[41]  V. Koltchinskii,et al.  Nuclear norm penalization and optimal rates for noisy low rank matrix completion , 2010, 1011.6256.

[42]  A. Tsybakov,et al.  Estimation of high-dimensional low-rank matrices , 2009, 0912.5338.

[43]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[44]  A. Tsybakov,et al.  Exponential Screening and optimal rates of sparse estimation , 2010, 1003.2654.

[45]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[46]  A. Dalalyan,et al.  Sharp Oracle Inequalities for Aggregation of Affine Estimators , 2011, 1104.3969.

[47]  Emmanuel J. Candès,et al.  Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements , 2011, IEEE Transactions on Information Theory.

[48]  Karim Lounici,et al.  Pac-Bayesian Bounds for Sparse Regression Estimation with Exponential Weights , 2010, 1009.2707.

[49]  A. Belloni,et al.  Pivotal estimation via square-root Lasso in nonparametric regression , 2011, 1105.1475.

[50]  R. Cooke Real and Complex Analysis , 2011 .

[51]  V. Koltchinskii,et al.  Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .

[52]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[53]  Cun-Hui Zhang,et al.  Scaled sparse linear regression , 2011, 1104.4595.

[54]  M. Wegkamp,et al.  Optimal selection of reduced rank estimators of high-dimensional matrices , 2010, 1004.2995.

[55]  Stéphane Gaïffas,et al.  Sharp Oracle Inequalities for High-Dimensional Matrix Prediction , 2011, IEEE Transactions on Information Theory.

[56]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[57]  Yu. I. Ingster,et al.  Statistical inference in compound functional models , 2012, 1208.6402.

[58]  A. V. D. Vaart,et al.  Needles and Straw in a Haystack: Posterior concentration for possibly sparse sequences , 2012, 1211.1197.

[59]  Matthijs Douze,et al.  Large-scale image classification with trace-norm regularization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Yu. Golubev,et al.  Ordered smoothers with exponential weighting , 2012, 1211.4207.

[61]  Arnak S. Dalalyan,et al.  Mirror averaging with sparsity priors , 2010, 1003.1189.

[62]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[63]  R. Tibshirani,et al.  Degrees of freedom in lasso problems , 2011, 1111.0653.

[64]  Ying Wu,et al.  A unified approach to salient object detection via low rank matrix recovery , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Pierre Alquier,et al.  Bayesian Methods for Low-Rank Matrix Estimation: Short Survey and Theoretical Study , 2013, ALT.

[66]  Tong Zhang,et al.  Aggregation of Affine Estimators , 2013, ArXiv.

[67]  Pierre Alquier,et al.  Sparse single-index model , 2011, J. Mach. Learn. Res..

[68]  S. Mendelson,et al.  On the optimality of the aggregate with exponential weights for low temperatures , 2013, 1303.5180.

[69]  Pierre Alquier,et al.  PAC-Bayesian estimation and prediction in sparse additive models , 2012, Electronic Journal of Statistics.

[70]  Yu. Golubev,et al.  Concentration inequalities for the exponential weighting method , 2014 .

[71]  C. Giraud Introduction to High-Dimensional Statistics , 2014 .

[72]  A. Tsybakov Aggregation and minimax optimality in high-dimensional estimation , 2014 .

[73]  Karim Lounici,et al.  Estimation and variable selection with exponential weights , 2014 .

[74]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[75]  O. Klopp Noisy low-rank matrix completion with general sampling distribution , 2012, 1203.0108.

[76]  A. Dalalyan,et al.  On the Prediction Performance of the Lasso , 2014, 1402.1700.

[77]  Emmanuel J. Candès,et al.  SLOPE is Adaptive to Unknown Sparsity and Asymptotically Minimax , 2015, ArXiv.

[78]  Weijie J. Su,et al.  SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION. , 2014, The annals of applied statistics.

[79]  Joel A. Tropp,et al.  An Introduction to Matrix Concentration Inequalities , 2015, Found. Trends Mach. Learn..

[80]  Harrison H. Zhou,et al.  A general framework for Bayes structured linear models , 2015, The Annals of Statistics.

[81]  Judith Rousseau,et al.  On adaptive posterior concentration rates , 2013, 1305.5270.

[82]  A. V. D. Vaart,et al.  BAYESIAN LINEAR REGRESSION WITH SPARSE PRIORS , 2014, 1403.0735.

[83]  Pierre Alquier,et al.  A Bayesian approach for noisy matrix completion: Optimal rate under general sampling distribution , 2014, 1408.5820.

[84]  Sara van de Geer,et al.  Ecole d'été de probabilités de Saint-Flour XLV , 2016 .

[85]  A. Dalalyan,et al.  On the prediction loss of the lasso in the partially labeled setting , 2016, 1606.06179.

[86]  É. Moulines,et al.  Sampling from a strongly log-concave distribution with the Unadjusted Langevin Algorithm , 2016 .

[87]  Sara van de Geer,et al.  Estimation and Testing Under Sparsity: École d'Été de Probabilités de Saint-Flour XLV – 2015 , 2016 .

[88]  Johannes Schmidt-Hieber,et al.  Conditions for Posterior Contraction in the Sparse Normal Means Problem , 2015, 1510.02232.

[89]  Pierre Alquier,et al.  1-Bit matrix completion: PAC-Bayesian analysis of a variational approximation , 2016, Machine Learning.

[90]  A. Tsybakov,et al.  Slope meets Lasso: Improved oracle bounds and optimality , 2016, The Annals of Statistics.