The horseshoe estimator for sparse signals

This paper proposes a new approach to sparsity, called the horseshoe estimator, which arises from a prior based on multivariate-normal scale mixtures. We describe the estimator's advantages over existing approaches, including its robustness, adaptivity to different sparsity patterns and analytical tractability. We prove two theorems: one that characterizes the horseshoe estimator's tail robustness and the other that demonstrates a super-efficient rate of convergence to the correct estimate of the sampling density in sparse situations. Finally, using both real and simulated data, we show that the horseshoe estimator corresponds quite closely to the answers obtained by Bayesian model averaging under a point-mass mixture prior. Copyright 2010, Oxford University Press.

[1]  H. Jeffreys,et al.  Theory of probability , 1896 .

[2]  L. M. M.-T. Theory of Probability , 1929, Nature.

[3]  I. S. Gradshteyn,et al.  Table of Integrals, Series, and Products , 1976 .

[4]  C. Stein Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution , 1956 .

[5]  R. Plackett Models in the Analysis of Variance , 1960 .

[6]  W. N. Bailey Confluent Hypergeometric Functions , 1960, Nature.

[7]  G. C. Tiao,et al.  BAYESIAN ANALYSIS OF RANDOM-EFFECT MODELS IN THE ANALYSIS OF VARIANCE. I. POSTERIOR DISTRIBUTION OF VARIANCE-COMPONENTS. , 1965 .

[8]  L. Brown Admissible Estimators, Recurrent Diffusions, and Insoluble Boundary Value Problems , 1971 .

[9]  W. Strawderman Proper Bayes Minimax Estimators of the Multivariate Normal Mean , 1971 .

[10]  B. Efron,et al.  Limiting the Risk of Bayes and Empirical Bayes Estimators—Part I: The Bayes Case , 1971 .

[11]  V. A. Uthoff,et al.  The Most Powerful Scale and Location Invariant Test of the Normal Versus the Double Exponential , 1973 .

[12]  C. Masreliez Approximate non-Gaussian filtering with linear state and observation relations , 1975 .

[13]  J. Berger A Robust Generalized Bayes Estimator and Confidence Region for a Multivariate Normal Mean , 1980 .

[14]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[15]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[16]  J. Copas Regression, Prediction and Shrinkage , 1983 .

[17]  J. Berger,et al.  Testing Precise Hypotheses , 1987 .

[18]  M. West On scale mixtures of normal distributions , 1987 .

[19]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[20]  Andrew R. Barron,et al.  Information-theoretic asymptotics of Bayes methods , 1990, IEEE Trans. Inf. Theory.

[21]  Nicholas G. Polson A representation of the posterior mean for a location model , 1991 .

[22]  Nicholas G. Polson,et al.  Inference for nonconjugate Bayesian Models using the Gibbs sampler , 1991 .

[23]  James O. Berger,et al.  Robust hierarchical Bayes estimation of exchangeable means , 1991 .

[24]  Eric P. Fox Bayesian Statistics 3 , 1991 .

[25]  Fan Tsai-Hung,et al.  BEHAVIOUR OF THE POSTERIOR DISTRIBUTION AND INFERENCES FOR A NORMAL MEAN WITH t PRIOR DISTRIBUTIONS , 1992 .

[26]  Adrian F. M. Smith,et al.  Exact and Approximate Posterior Moments for a Normal Location Parameter , 1992 .

[27]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[28]  A NOTE ON POSTERIOR MOMENTS FOR A NORMAL MEAN WITH DOUBLE-EXPONENTIAL PRIOR , 1994 .

[29]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[30]  Michael B. Gordy A generalization of generalized beta distributions , 1998 .

[31]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[32]  Dean Phillips Foster,et al.  Calibration and empirical Bayes variable selection , 2000 .

[33]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[34]  Mário A. T. Figueiredo Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  A. Gelman Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) , 2004 .

[36]  Bani K. Mallick,et al.  Gene selection using a two-level hierarchical Bayesian model , 2004, Bioinform..

[37]  I. Johnstone,et al.  Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences , 2004, math/0410088.

[38]  J. Griffin,et al.  Alternative prior distributions for variable selection with very many more variables than observations , 2005 .

[39]  James G. Scott,et al.  An exploration of aspects of Bayesian multiple testing , 2006 .

[40]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[41]  James G. Scott,et al.  Feature-Inclusion Stochastic Search for Gaussian Graphical Models , 2008 .

[42]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[43]  Bradley Efron,et al.  Microarrays, Empirical Bayes and the Two-Groups Model. Rejoinder. , 2008, 0808.0572.

[44]  J. Ghosh,et al.  A comparison of the Benjamini-Hochberg procedure with some Bayesian rules for multiple testing , 2008, 0805.2479.

[45]  Chris Hans Bayesian lasso regression , 2009 .

[46]  James G. Scott,et al.  Objective Bayesian model selection in Gaussian graphical models , 2009 .

[47]  James G. Scott,et al.  Bayesian Adjustment for Multiplicity , 2009 .

[48]  Fabian Scheipl,et al.  Locally adaptive Bayesian P-splines with a Normal-Exponential-Gamma prior , 2009, Comput. Stat. Data Anal..

[49]  James G. Scott,et al.  Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem , 2010, 1011.2333.

[50]  M. Steel,et al.  Mixtures of G-Priors for Bayesian Model Averaging with Economic Application , 2011 .

[51]  Edward I. George,et al.  Bayesian prediction with adaptive ridge estimators , 2012 .