Posterior distributions are useful for a broad range of tasks in machine learning ranging from model selection to reinforcement learning. Given that modern machine learning models can have millions of parameters, selecting an informative prior is typically infeasible, resulting in widespread use of priors that avoid strong assumptions. For example, recent work on deep generative models (Kingma & Welling, 2014; Rezende et al., 2014) commonly uses the standard Normal distribution for the prior on the latent space. However, just because a prior is relatively flat does not mean it is uninformative. The Jeffreys prior for the Bernoulli model serves as a well-known counter example: Jeffreys (1946) showed that the arcsine distribution, despite its peaks near 0 and 1, is the truly objective prior (with respect to Fisher information) and not the uniform distribution. This suggests that objective priors such as the Jeffreys or the related Reference prior (Bernardo, 2005) are worthy of investigation for high-dimensional, web-scale probabilistic models. However, the challenge is that these priors are difficult to derive for all but the simplest models.
[1]
Max Welling,et al.
Auto-Encoding Variational Bayes
,
2013,
ICLR.
[2]
Richard E. Turner,et al.
Variational Inference with Rényi Divergence
,
2016,
ArXiv.
[3]
Sean Gerrish,et al.
Black Box Variational Inference
,
2013,
AISTATS.
[4]
Larry A. Wasserman,et al.
Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk
,
2001,
UAI.
[5]
Yoshua Bengio,et al.
Generative Adversarial Nets
,
2014,
NIPS.
[6]
Dustin Tran,et al.
Operator Variational Inference
,
2016,
NIPS.
[7]
H. Jeffreys.
An invariant form for the prior probability in estimation problems
,
1946,
Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.
[8]
Daan Wierstra,et al.
Stochastic Backpropagation and Approximate Inference in Deep Generative Models
,
2014,
ICML.
[9]
J. Bernardo,et al.
THE FORMAL DEFINITION OF REFERENCE PRIORS
,
2009,
0904.0156.
[10]
J. Bernardo.
Reference Analysis
,
2005
.
[11]
Yee Whye Teh,et al.
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
,
2016,
ICLR.