The geometry of prior selection

This contribution is devoted to the selection of prior in a Bayesian learning framework. There is an extensive literature on the construction of non-informative priors and the subject seems far from a definite solution [Kass and Wasserman, Formal rules for selecting prior distributions: a review and annotated bibliography, Technical Report No. 583, Department of Statistics, Carnegie Mellon University, 1994]. We consider this problem in the light of the recent development of information geometric tools [Amari and Nagaoka, Methods of information geometry, in: Translations of Mathematical Monographs, AMS, vol. 191, Oxford University Press, Oxford, 2000]. The differential geometric analysis allows the formulation of the prior selection problem in a general manifold valued set of probability distributions. In order to construct the prior distribution, we propose a criteria expressing the trade off between decision error and uniformity constraint. The solution has an explicit expression obtained by variational calculus. In addition, it has two important invariance properties: invariance to the dominant measure of the data space and also invariance to the parametrization of a restricted parametric manifold. We show how the construction of a prior by projection is the best way to take into account the restriction to a particular family of parametric models. For instance, we apply this procedure to autoparallel restricted families. Two practical examples illustrate the proposed construction of prior. The first example deals with the learning of a mixture of multivariate Gaussians in a classification perspective. We show in this learning problem how the penalization of likelihood by the proposed prior eliminates the degeneracy occurring when approaching singularity points. The second example treats the blind source separation problem.

[1]  Vijay Balasubramanian,et al.  Statistical Inference, Occam's Razor, and Statistical Mechanics on the Space of Probability Distributions , 1996, Neural Computation.

[2]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[3]  A. Mohammad-Djafari,et al.  Information geometry and prior selection , 2003 .

[4]  C. C. Rodriguez Entropic priors for discrete probabilistic networks and for mixtures of Gaussians models , 2002, physics/0201016.

[5]  Hichem Snoussi,et al.  Penalized maximum likelihood for multivariate Gaussian mixture , 2002 .

[6]  Hichem Snoussi,et al.  MCMC joint separation and segmentation of hidden Markov fields , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[7]  C. Rodr Entropic Priors , 1991 .

[8]  Robert E. Kass,et al.  Formal rules for selecting prior distributions: A review and annotated bibliography , 1993 .

[9]  Kevin H. Knuth A Bayesian approach to source separation , 1999 .

[10]  Vijay Balasubramanian,et al.  A Geometric Formulation of Occam's Razor For Inference of Parametric Distributions , 1996, adap-org/9601001.

[11]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[12]  Huaiyu Zhu,et al.  Bayesian invariant measurements of generalisation for continuous distributions , 1995 .

[13]  Huaiyu Zhu,et al.  Bayesian invariant measurements of generalisation for discrete distributions , 1995 .

[14]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .