Approximate inference for the loss-calibrated Bayesian

We consider the problem of approximate inference in the context of Bayesian decision theory. Traditional approaches focus on approximating general properties of the posterior, ignoring the decision task { and associated losses { for which the posterior could be used. We argue that this can be suboptimal and propose instead to loss-calibrate the approximate inference methods with respect to the decision task at hand. We present a general framework rooted in Bayesian decision theory to analyze approximate inference from the perspective of losses, opening up several research directions. As a rst loss-calibrated approximate inference attempt, we propose an EM-like algorithm on the Bayesian posterior risk and show how it can improve a standard approach to Gaussian process classication when losses are asymmetric.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[3]  Christian P. Robert,et al.  The Bayesian choice , 1994 .

[4]  M. Schervish Theory of Statistics , 1995 .

[5]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[6]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[7]  L. Csató Gaussian processes:iterative sparse approximations , 2002 .

[8]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[9]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[10]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[11]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[12]  Gökhan BakIr,et al.  Predicting Structured Data , 2008 .

[13]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[14]  Edward Lloyd Snelson,et al.  Flexible and efficient Gaussian process models for machine learning , 2007 .

[15]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[16]  C. Rasmussen,et al.  Approximations for Binary Gaussian Process Classification , 2008 .

[17]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[18]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[19]  Veselin Stoyanov,et al.  Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[20]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[21]  Tony Jebara,et al.  Multitask Sparsity via Maximum Entropy Discrimination , 2011, J. Mach. Learn. Res..