Integration of Stochastic Models by Minimizing -Divergence

When there are a number of stochastic models in the form of probability distributions, one needs to integrate them. Mixtures of distributions are frequently used, but exponential mixtures also provide a good means of integration. This letter proposes a one-parameter family of integration, called -integration, which includes all of these well-known integrations. These are generalizations of various averages of numbers such as arithmetic, geometric, and harmonic averages. There are psychophysical experiments that suggest that -integrations are used in the brain. The -divergence between two distributions is defined, which is a natural generalization of Kullback-Leibler divergence and Hellinger distance, and it is proved that -integration is optimal in the sense of minimizing -divergence. The theory is applied to generalize the mixture of experts and the product of experts to the -mixture of experts. The -predictive distribution is also stated in the Bayesian framework.

[1]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[2]  A. Rényi On Measures of Entropy and Information , 1961 .

[3]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[4]  S. Eguchi Second Order Efficiency of Minimum Contrast Estimators in a Curved Exponential Family , 1983 .

[5]  J. Falmagne Elements of psychophysical theory , 1985 .

[6]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[7]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[8]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[9]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[10]  F. Komaki On asymptotic properties of predictive distributions , 1996 .

[11]  D M Wolpert,et al.  Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[12]  José Manuel Corcuera,et al.  A Generalized Bayes Rule for Prediction , 1999 .

[13]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[14]  ApproximationShun,et al.  Information Geometry of-Projection inMean Field , 2000 .

[15]  Mihoko Minami,et al.  Robust Blind Source Separation by Beta Divergence , 2002, Neural Computation.

[16]  Paul Marriott,et al.  On the local geometry of mixture models , 2002 .

[17]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[18]  Yasuo Matsuyama,et al.  The alpha-EM algorithm: surrogate likelihood maximization using alpha-logarithmic information measures , 2003, IEEE Trans. Inf. Theory.

[19]  Takafumi Kanamori,et al.  Information Geometry of U-Boost and Bregman Divergence , 2004, Neural Computation.

[20]  Lei Xu,et al.  Advances on BYY harmony learning: information theoretic perspective, generalized projection geometry, and independent factor autodetermination , 2004, IEEE Transactions on Neural Networks.

[21]  Shun-ichi Amari,et al.  Stochastic Reasoning, Free Energy, and Information Geometry , 2004, Neural Computation.

[22]  Jun Zhang,et al.  Divergence Function, Duality, and Convex Analysis , 2004, Neural Computation.

[23]  S. Amari,et al.  Information Geometry of α-Projection in Mean Field Approximation , 2004 .

[24]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[25]  Dénes Petz,et al.  Means of Positive Numbers and Matrices , 2005, SIAM J. Matrix Anal. Appl..

[26]  Kazuyuki Aihara,et al.  Generalization of the Mean-Field Method for Power-Law Distributions , 2006, Int. J. Bifurc. Chaos.

[27]  J. Heller Illumination-invariance of Plateau's midgray , 2006 .

[28]  Huaiyu Zhu,et al.  Bayesian invariant measurements of generalization , 1995, Neural Processing Letters.