Monte Carlo Information-Geometric Structures

Exponential families and mixture families are parametric probability models that can be geometrically studied as smooth statistical manifolds with respect to any statistical divergence like the Kullback–Leibler (KL) divergence or the Hellinger divergence. When equipping a statistical manifold with the KL divergence, the induced manifold structure is dually flat, and the KL divergence between distributions amounts to an equivalent Bregman divergence on their corresponding parameters. In practice, the corresponding Bregman generators of mixture/exponential families require to perform definite integral calculus that can either be too time-consuming (for exponentially large discrete support case) or even do not admit closed-form formula (for continuous support case). In these cases, the dually flat construction remains theoretical and cannot be used by information-geometric algorithms. To bypass this problem, we consider performing stochastic Monte Carlo (MC) estimation of those integral-based mixture/exponential family Bregman generators. We show that, under natural assumptions, these MC generators are almost surely Bregman generators. We define a series of dually flat information geometries, termed Monte Carlo Information Geometries, that increasingly-finely approximate the untractable geometry. The advantage of this MCIG is that it allows a practical use of the Bregman algorithmic toolbox on a wide range of probability distribution families. We demonstrate our approach with a clustering task on a mixture family manifold. We then show how to generate MCIG for arbitrary separable statistical divergence between distributions belonging to a same parametric family of distributions.

[1]  Frank Nielsen,et al.  Patch Matching with Polynomial Exponential Families and Projective Divergences , 2016, SISAP.

[2]  Richard Nock,et al.  On Bregman Voronoi diagrams , 2007, SODA '07.

[3]  Ann F. S. Mitchell Statistical Manifolds of univariate elliptic distributions , 1988 .

[4]  Frank Nielsen,et al.  Tailored Bregman Ball Trees for Effective Nearest Neighbors , 2009 .

[5]  A. Dawid The geometry of proper scoring rules , 2007 .

[6]  Shun-ichi Amari,et al.  Information Geometry and Its Applications , 2016 .

[7]  Richard Nock,et al.  Mixed Bregman Clustering with Approximation Guarantees , 2008, ECML/PKDD.

[8]  Frank Nielsen,et al.  On the Geometry of Mixtures of Prescribed Distributions , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Frank Nielsen,et al.  A family of statistical symmetric divergences based on Jensen's inequality , 2010, ArXiv.

[10]  Hirohiko Shima,et al.  Geometry of Hessian Structures , 2013, GSI.

[11]  Frank Nielsen,et al.  Statistical exponential families: A digest with flash cards , 2009, ArXiv.

[12]  Inderjit S. Dhillon,et al.  Differential Entropic Clustering of Multivariate Gaussians , 2006, NIPS.

[13]  Frank Nielsen,et al.  Hypothesis Testing, Information Divergence and Computational Geometry , 2013, GSI.

[14]  C. Udriste,et al.  Geometric Modeling in Probability and Statistics , 2014 .

[15]  Frank Nielsen,et al.  Introduction to HPC with MPI for Data Science , 2016, Undergraduate Topics in Computer Science.

[16]  R. Kass,et al.  Geometrical Foundations of Asymptotic Inference: Kass/Geometrical , 1997 .

[17]  Paul Marriott,et al.  Computational Information Geometry in Statistics: Theory and Practice , 2014, Entropy.

[18]  Frank Nielsen,et al.  Simplifying Gaussian mixture models via entropic quantization , 2009, 2009 17th European Signal Processing Conference.

[19]  Jun Zhang,et al.  Reference duality and representation duality in information geometry , 2015 .

[20]  Mark D. Reid,et al.  Convex foundations for generalized MaxEnt models , 2014 .

[21]  Frank Nielsen,et al.  Sided and Symmetrized Bregman Centroids , 2009, IEEE Transactions on Information Theory.

[22]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[23]  S. Eguchi Second Order Efficiency of Minimum Contrast Estimators in a Curved Exponential Family , 1983 .

[24]  Bruno Pelletier,et al.  Informative barycentres in statistics , 2005 .

[25]  Frank Nielsen,et al.  On the Smallest Enclosing Information Disk , 2008, CCCG.

[26]  Christian P. Robert,et al.  Monte Carlo Methods , 2016 .

[27]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[28]  Frank Nielsen,et al.  The Burbea-Rao and Bhattacharyya Centroids , 2010, IEEE Transactions on Information Theory.

[29]  S. Mukherjee,et al.  Inference in Ising Models , 2015, 1507.07055.

[30]  Qiang Liu,et al.  Distributed Estimation, Information Loss and Exponential Families , 2014, NIPS.

[31]  Shinto Eguchi,et al.  Spontaneous Clustering via Minimum Gamma-Divergence , 2014, Neural Computation.

[32]  Frank Nielsen,et al.  On Hölder Projective Divergences , 2017, Entropy.

[33]  Frank Nielsen,et al.  An Information-Geometric Characterization of Chernoff Information , 2013, IEEE Signal Processing Letters.

[34]  Frank Nielsen,et al.  Bregman vantage point trees for efficient nearest Neighbor Queries , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[35]  Antonio Maria Scarfone,et al.  A Sequential Structure of Statistical Manifolds on Deformed Exponential Family , 2017, GSI.

[36]  Frank Nielsen,et al.  Guaranteed Bounds on Information-Theoretic Measures of Univariate Mixtures Using Piecewise Log-Sum-Exp Inequalities , 2016, Entropy.

[37]  S. Amari,et al.  Information geometry of divergence functions , 2010 .

[38]  Ruslan Salakhutdinov,et al.  Learning Stochastic Feedforward Neural Networks , 2013, NIPS.

[39]  Frank Nielsen,et al.  Optimal Interval Clustering: Application to Bregman Clustering and Statistical Mixture Learning , 2014, IEEE Signal Processing Letters.

[40]  Calyampudi R. Rao,et al.  Chapter 4: Statistical Manifolds , 1987 .

[41]  Allan Grønlund Jørgensen,et al.  Fast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D , 2017, ArXiv.

[42]  L. Cobb,et al.  Estimation and Moment Recursion Relations for Multimodal Distributions of the Exponential Family , 1983 .

[43]  Frank Nielsen,et al.  Visualizing bregman voronoi diagrams , 2007, SCG '07.

[44]  D. Russell Luke,et al.  Symbolic Computation with Monotone Operators , 2017 .

[45]  S. Eguchi Geometry of minimum contrast , 1992 .

[46]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[47]  Stuart Geman,et al.  Markov Random Field Image Models and Their Applications to Computer Vision , 2010 .

[48]  Frank Nielsen,et al.  On w-mixtures: Finite convex combinations of prescribed component distributions , 2017, ArXiv.

[49]  Jean-Pierre Crouzeix,et al.  A relationship between the second derivatives of a convex function and of its conjugate , 1977, Math. Program..

[50]  Lacra Pavel,et al.  On the Properties of the Softmax Function with Application in Game Theory and Reinforcement Learning , 2017, ArXiv.

[51]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .