Monte Carlo Information Geometry: The dually flat case

Exponential families and mixture families are parametric probability models that can be geometrically studied as smooth statistical manifolds with respect to any statistical divergence like the Kullback-Leibler (KL) divergence or the Hellinger divergence. When equipping a statistical manifold with the KL divergence, the induced manifold structure is dually flat, and the KL divergence between distributions amounts to an equivalent Bregman divergence on their corresponding parameters. In practice, the corresponding Bregman generators of mixture/exponential families require to perform definite integral calculus that can either be too time-consuming (for exponentially large discrete support case) or even do not admit closed-form formula (for continuous support case). In these cases, the dually flat construction remains theoretical and cannot be used by information-geometric algorithms. To bypass this problem, we consider performing stochastic Monte Carlo (MC) estimation of those integral-based mixture/exponential family Bregman generators. We show that, under natural assumptions, these MC generators are almost surely Bregman generators. We define a series of dually flat information geometries, termed Monte Carlo Information Geometries, that increasingly-finely approximate the untractable geometry. The advantage of this MCIG is that it allows a practical use of the Bregman algorithmic toolbox on a wide range of probability distribution families. We demonstrate our approach with a clustering task on a mixture family manifold.

[1]  Richard Nock,et al.  On Bregman Voronoi diagrams , 2007, SODA '07.

[2]  D. Russell Luke,et al.  Symbolic Computation with Monotone Operators , 2017 .

[3]  Rémi Bardenet,et al.  Monte Carlo Methods , 2013, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[4]  D. Fleisch,et al.  A Student's Guide to Vectors and Tensors , 2011 .

[5]  Frank Nielsen,et al.  On w-mixtures: Finite convex combinations of prescribed component distributions , 2017, ArXiv.

[6]  Frank Nielsen,et al.  Statistical exponential families: A digest with flash cards , 2009, ArXiv.

[7]  Frank Nielsen,et al.  Patch Matching with Polynomial Exponential Families and Projective Divergences , 2016, SISAP.

[8]  Frank Nielsen,et al.  Introduction to HPC with MPI for Data Science , 2016, Undergraduate Topics in Computer Science.

[9]  Frank Nielsen,et al.  Bregman vantage point trees for efficient nearest Neighbor Queries , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[10]  Jean-Pierre Crouzeix,et al.  A relationship between the second derivatives of a convex function and of its conjugate , 1977, Math. Program..

[11]  Frank Nielsen,et al.  Tailored Bregman Ball Trees for Effective Nearest Neighbors , 2009 .

[12]  Inderjit S. Dhillon,et al.  Differential Entropic Clustering of Multivariate Gaussians , 2006, NIPS.

[13]  Frank Nielsen,et al.  On the Smallest Enclosing Information Disk , 2008, CCCG.

[14]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[15]  Frank Nielsen,et al.  Optimal Interval Clustering: Application to Bregman Clustering and Statistical Mixture Learning , 2014, IEEE Signal Processing Letters.

[16]  Stuart Geman,et al.  Markov Random Field Image Models and Their Applications to Computer Vision , 2010 .

[17]  Richard Nock,et al.  Mixed Bregman Clustering with Approximation Guarantees , 2008, ECML/PKDD.

[18]  R. Varga Geršgorin And His Circles , 2004 .

[19]  S. Eguchi Geometry of minimum contrast , 1992 .

[20]  Qiang Liu,et al.  Distributed Estimation, Information Loss and Exponential Families , 2014, NIPS.

[21]  Jun Zhang,et al.  Reference duality and representation duality in information geometry , 2015 .

[22]  Frank Nielsen,et al.  The Burbea-Rao and Bhattacharyya Centroids , 2010, IEEE Transactions on Information Theory.

[23]  Ted Chang Geometrical foundations of asymptotic inference , 2002 .

[24]  C. Udriste,et al.  Geometric Modeling in Probability and Statistics , 2014 .

[25]  Antonio Maria Scarfone,et al.  A Sequential Structure of Statistical Manifolds on Deformed Exponential Family , 2017, GSI.

[26]  S. Amari,et al.  Information geometry of divergence functions , 2010 .

[27]  Lacra Pavel,et al.  On the Properties of the Softmax Function with Application in Game Theory and Reinforcement Learning , 2017, ArXiv.

[28]  L. Cobb,et al.  Estimation and Moment Recursion Relations for Multimodal Distributions of the Exponential Family , 1983 .

[29]  F. Opitz Information geometry and its applications , 2012, 2012 9th European Radar Conference.

[30]  Frank Nielsen,et al.  Hypothesis Testing, Information Divergence and Computational Geometry , 2013, GSI.

[31]  Frank Nielsen,et al.  Sided and Symmetrized Bregman Centroids , 2009, IEEE Transactions on Information Theory.

[32]  A. Dawid The geometry of proper scoring rules , 2007 .

[33]  S. Mukherjee,et al.  Inference in Ising Models , 2015, 1507.07055.

[34]  Frank Nielsen,et al.  Simplifying Gaussian mixture models via entropic quantization , 2009, 2009 17th European Signal Processing Conference.

[35]  Frank Nielsen,et al.  Guaranteed Bounds on Information-Theoretic Measures of Univariate Mixtures Using Piecewise Log-Sum-Exp Inequalities , 2016, Entropy.

[36]  Ruslan Salakhutdinov,et al.  Learning Stochastic Feedforward Neural Networks , 2013, NIPS.

[37]  Bruno Pelletier,et al.  Informative barycentres in statistics , 2005 .

[38]  Frank Nielsen,et al.  An Information-Geometric Characterization of Chernoff Information , 2013, IEEE Signal Processing Letters.

[39]  Hirohiko Shima,et al.  Geometry of Hessian Structures , 2013, GSI.

[40]  Keisuke Yamazaki,et al.  Kullback Information of Normal Mixture is not an Analytic Function , 2004 .

[41]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[42]  Allan Grønlund Jørgensen,et al.  Fast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D , 2017, ArXiv.

[43]  Paul Marriott,et al.  Computational Information Geometry in Statistics: Theory and Practice , 2014, Entropy.

[44]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[45]  Frank Nielsen,et al.  A family of statistical symmetric divergences based on Jensen's inequality , 2010, ArXiv.

[46]  Frank Nielsen,et al.  On the Geometry of Mixtures of Prescribed Distributions , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[47]  B. Cipra The Ising Model Is NP-Complete , 2000 .