Bayesian Penalty Mixing: The Case of a Non-separable Penalty

Separable penalties for sparse vector recovery are plentiful throughout statistical methodology and theory. Here, we confine attention to the problem of estimating sparse high-dimensional normal means. Separable penalized likelihood estimators are known to have a Bayesian interpretation as posterior modes under independent product priors. Such estimators can achieve rate-minimax performance when the correct level of sparsity is known. A fully Bayes approach, on the other hand, mixes the product priors over a shared complexity parameter. These constructions can yield a self-adaptive posterior that achieves rate-minimax performance when the sparsity level is unknown. Such optimality has also been established for posterior mean functionals. However, less is known about posterior modes in these setups. Ultimately, the mixing priors render the coordinates dependent through a penalty that is no longer separable. By tying the coordinates together, the hope is to gain adaptivity and achieve automatic hyperparameter tuning. Here, we study two examples of fully Bayes penalties: the fully Bayes LASSO and the fully Bayes Spike-and-Slab LASSO of Rockova and George (The Spike-and-Slab LASSO, Submitted). We discuss discrepancies and highlight the benefits of the two-group prior variant. We develop an Appell function apparatus for coping with adaptive selection thresholds. We show that the fully Bayes treatment of a complexity parameter is tantamount to oracle hyperparameter choice for sparse normal mean estimation.

[1]  James G. Scott,et al.  Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction , 2022 .

[2]  A. V. D. Vaart,et al.  Needles and Straw in a Haystack: Posterior concentration for possibly sparse sequences , 2012, 1211.1197.

[3]  H. Bondell,et al.  Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR , 2008, Biometrics.

[4]  E. George,et al.  The Spike-and-Slab LASSO , 2018 .

[5]  J. Griffin,et al.  BAYESIAN HYPER‐LASSOS WITH NON‐CONVEX PENALIZATION , 2011 .

[6]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[7]  Sergei M. Sitnik,et al.  Inequalities and monotonicity of ratios for generalized hypergeometric function , 2007, Journal of Approximation Theory.

[8]  Jinchi Lv,et al.  High dimensional thresholded regression and shrinkage effect , 2014, 1605.03306.

[9]  L. Brown Admissible Estimators, Recurrent Diffusions, and Insoluble Boundary Value Problems , 1971 .

[10]  J. Friedman Fast sparse regression and classification , 2012 .

[11]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[12]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[13]  E. George Combining Minimax Shrinkage Estimators , 1986 .

[14]  J. Pitman,et al.  Algebraic Evaluations of Some Euler Integrals, Duplication Formulae for Appell's Hypergeometric Function F 1, and Brownian Variations , 2000, Canadian Journal of Mathematics.

[15]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[16]  A. V. D. Vaart,et al.  BAYESIAN LINEAR REGRESSION WITH SPARSE PRIORS , 2014, 1403.0735.

[17]  M. J. Bayarri,et al.  Prior Assessments for Prediction in Queues , 1994 .

[18]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[19]  Veronika Rockova,et al.  EMVS: The EM Approach to Bayesian Variable Selection , 2014 .

[20]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[21]  E. George Minimax Multiple Shrinkage Estimation , 1986 .

[22]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[23]  Zhaoran Wang,et al.  OPTIMAL COMPUTATIONAL AND STATISTICAL RATES OF CONVERGENCE FOR SPARSE NONCONVEX LEARNING PROBLEMS. , 2013, Annals of statistics.

[24]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[25]  Jinchi Lv,et al.  Asymptotic properties for combined L1 and concave regularization , 2014, 1605.03335.

[26]  I. Johnstone,et al.  Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences , 2004, math/0410088.

[27]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[28]  I. S. Gradshteyn,et al.  Table of Integrals, Series, and Products , 1976 .

[29]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[30]  E. George,et al.  Fast Bayesian Factor Analysis via Automatic Rotations to Sparsity , 2016 .

[31]  I. Johnstone,et al.  Maximum Entropy and the Nearly Black Object , 1992 .