Statistical Inference for Bayesian Risk Minimization via Exponentially Tilted Empirical Likelihood

The celebrated Bernstein von-Mises theorem ensures that credible regions from Bayesian posterior are well-calibrated when the model is correctly-specified, in the frequentist sense that their coverage probabilities tend to the nominal values as data accrue. However, this conventional Bayesian framework is known to lack robustness when the model is misspecified or only partly specified, such as in quantile regression, risk minimization based supervised/unsupervised learning and robust estimation. To overcome this difficulty, we propose a new Bayesian inferential approach that substitutes the (misspecified or partly specified) likelihoods with proper exponentially tilted empirical likelihoods plus a regularization term. Our surrogate empirical likelihood is carefully constructed by using the first order optimality condition of the empirical risk minimization as the moment condition. We show that the Bayesian posterior obtained by combining this surrogate empirical likelihood and the prior is asymptotically close to a normal distribution centering at the empirical risk minimizer with covariance matrix taking an appropriate sandwiched form. Consequently, the resulting Bayesian credible regions are automatically calibrated to deliver valid uncertainty quantification. Computationally, the proposed method can be easily implemented by Markov Chain Monte Carlo sampling algorithms. Our numerical results show that the proposed method tends to be more accurate than existing state-of-the-art competitors. Keywords— Bayesian inference; Risk minimization; Exponentially tilted empirical likelihood; Gibbs posterior; Misspecified model; Robust estimation

[1]  Ryan Martin,et al.  Gibbs posterior concentration rates under sub-exponential type losses , 2020, Bernoulli.

[2]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[3]  G. A. Young,et al.  High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[4]  Ryan Martin,et al.  Gibbs posterior inference on multivariate quantiles , 2020, Journal of Statistical Planning and Inference.

[5]  Changbao Wu,et al.  Bayesian empirical likelihood inference with complex survey data , 2019, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[6]  Chengjian Sun,et al.  Model-Free Unsupervised Learning for Optimization Problems with Constraints , 2019, 2019 25th Asia-Pacific Conference on Communications (APCC).

[7]  Ryan Martin,et al.  Empirical Priors for Prediction in Sparse High-dimensional Linear Regression , 2019, J. Mach. Learn. Res..

[8]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[9]  Benjamin Guedj,et al.  A Primer on PAC-Bayesian Learning , 2019, ICML 2019.

[10]  Suely Oliveira,et al.  Smoothed Hinge Loss and ℓ1 Support Vector Machines , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[11]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[12]  Martin Raivc,et al.  A multivariate Berry–Esseen theorem with explicit constants , 2018, Bernoulli.

[13]  Yonina C. Eldar,et al.  Sparse Nonlinear Regression: Parameter Estimation under Nonconvexity , 2016, ICML.

[14]  Changbao Wu,et al.  Calibration Weighting Methods for Complex Surveys , 2016 .

[15]  Ryan Martin,et al.  Calibrating general posterior credible regions , 2015, Biometrika.

[16]  Thijs van Ommen,et al.  Inconsistency of Bayesian Inference for Misspecified Linear Models, and a Proposal for Repairing It , 2014, 1412.3730.

[17]  Stephen G. Walker,et al.  Empirical Bayes posterior concentration in sparse high-dimensional linear models , 2014, 1406.7718.

[18]  A. V. D. Vaart,et al.  BAYESIAN LINEAR REGRESSION WITH SPARSE PRIORS , 2014, 1403.0735.

[19]  Pier Giovanni Bissiri,et al.  A general framework for updating belief distributions , 2013, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[20]  R. Ramamoorthi,et al.  Posterior Consistency of Bayesian Quantile Regression Based on the Misspecified Asymmetric Laplace Density , 2013 .

[21]  Nicholas G. Polson,et al.  Data augmentation for support vector machines , 2011 .

[22]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[23]  Sylvia Richardson,et al.  Evolutionary Stochastic Search for Bayesian model exploration , 2010, 1002.2706.

[24]  Michel Broniatowski,et al.  Divergences and Duality for Estimation and Test under Moment Condition Models , 2010, 1002.0730.

[25]  Noël Veraverbeke,et al.  Empirical Likelihood for Non‐Smooth Criterion Functions , 2009 .

[26]  T. Nummi Introduction to Empirical Processes and Semiparametric Inference by Michael R. Kosorok , 2009 .

[27]  Rahul Mukerjee,et al.  Bayesian and frequentist confidence intervals arising from empirical-type likelihoods , 2008 .

[28]  Pierre Alquier PAC-Bayesian bounds for randomized empirical risk minimizers , 2007, 0712.1698.

[29]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[30]  J. Jurečková Quantile Regression , 2006 .

[31]  Susanne M. Schennach,et al.  Accompanying document to "Point Estimation with Exponentially Tilted Empirical Likelihood" , 2005, math/0512181.

[32]  T. Lancaster,et al.  Bayesian Quantile Regression , 2005 .

[33]  Julia Kastner,et al.  Introduction to Robust Estimation and Hypothesis Testing , 2005 .

[34]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[35]  Susanne M. Schennach,et al.  Bayesian exponentially tilted empirical likelihood , 2005 .

[36]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[37]  N. Lazar Bayesian empirical likelihood , 2003 .

[38]  V. Chernozhukov,et al.  An MCMC Approach to Classical Estimation , 2002, 2301.07782.

[39]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[40]  A. Gelman,et al.  Weak convergence and optimal scaling of random walk Metropolis algorithms , 1997 .

[41]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[42]  Gert Cauwenberghs,et al.  A Fast Stochastic Error-Descent Algorithm for Supervised Learning and Optimization , 1992, NIPS.

[43]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[44]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[45]  A. Owen Empirical Likelihood Ratio Confidence Regions , 1990 .

[46]  H. Robbins A Stochastic Approximation Method , 1951 .

[47]  Anna Simoni,et al.  Online appendix to : Bayesian Estimation and Comparison of Moment Condition Models , 2017 .

[48]  Jianqing Fan,et al.  High-Dimensional Statistics , 2014 .

[49]  Van Der Vaart,et al.  UvA-DARE ( Digital Academic Repository ) The Bernstein-Von-Mises theorem under misspecification , 2012 .

[50]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[51]  R. Koenker,et al.  Regression Quantiles , 2007 .

[52]  P. Dellaportas,et al.  On Bayesian model and variable selection using MCMC , 2002, Stat. Comput..

[53]  Wu Using empirical likelihood methods to obtain range restricted weights in regression estimators for surveys , 2002 .

[54]  Trevor Hastie,et al.  Linear Methods for Classification , 2001 .

[55]  W. Newey,et al.  Large sample estimation and hypothesis testing , 1986 .

[56]  B. D. Finetti,et al.  Bayesian inference and decision techniques : essays in honor of Bruno de Finetti , 1986 .

[57]  Peter J. Rousseeuw,et al.  ROBUST REGRESSION BY MEANS OF S-ESTIMATORS , 1984 .

[58]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[59]  Changbao Wu,et al.  University of Waterloo Department of Statistics and Actuarial Science Bayesian Pseudo Empirical Likelihood Intervals for Complex Surveys Bayesian Pseudo Empirical Likelihood Intervals for Complex Surveys , 2022 .

[60]  Joachim M. Buhmann,et al.  Grosser Systeme Echtzeitoptimierung Schwerpunktprogramm Der Deutschen Forschungsgemeinschaft Empirical Risk Approximation: an Induction Principle for Unsupervised Learning , 2022 .