On the Geometry of Mixtures of Prescribed Distributions

We consider the space of w-mixtures that are finite statistical mixtures sharing the same prescribed component distributions, like Gaussian mixture models sharing the same components. The information geometry induced by the Kullback-Leibler (KL) divergence yields a dually flat space where the KL divergence between two w-mixtures amounts to a Bregman divergence for the negative Shannon entropy generator, called the Shannon information. Furthermore, we prove that the skew Jensen-Shannon statistical divergence between w-mixtures amount to skew Jensen divergences on their parameters and state several divergence inequalities between w-mixtures and their closures.

[1]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[2]  Frank Nielsen,et al.  Fitting the Smallest Enclosing Bregman Ball , 2005, ECML.

[3]  Kamyar Moshksar,et al.  Arbitrarily Tight Bounds on Differential Entropy of Gaussian Mixtures , 2016, IEEE Transactions on Information Theory.

[4]  Artemy Kolchinsky,et al.  Estimating Mixture Entropy with Pairwise Distances , 2017, Entropy.

[5]  Jun Zhang,et al.  Divergence Function, Duality, and Convex Analysis , 2004, Neural Computation.

[6]  S. Amari,et al.  Information geometry of divergence functions , 2010 .

[7]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[8]  Frank Nielsen,et al.  Comix: Joint estimation and lightspeed comparison of mixture models , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Frank Nielsen,et al.  Cramer-Rao Lower Bound and Information Geometry , 2013, ArXiv.

[10]  Frank Nielsen,et al.  The Burbea-Rao and Bhattacharyya Centroids , 2010, IEEE Transactions on Information Theory.

[11]  S. Amari Integration of Stochastic Models by Minimizing -Divergence , 2007, Neural Computation.

[12]  Frank Nielsen,et al.  On w-mixtures: Finite convex combinations of prescribed component distributions , 2017, ArXiv.

[13]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[14]  Frank Nielsen,et al.  Sided and Symmetrized Bregman Centroids , 2009, IEEE Transactions on Information Theory.

[15]  M. Drton,et al.  Maximum Likelihood Estimates for Gaussian Mixtures Are Transcendental , 2015, MACIS.

[16]  Frank Nielsen,et al.  Guaranteed Bounds on the Kullback–Leibler Divergence of Univariate Mixtures , 2016, IEEE Signal Processing Letters.

[17]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[18]  Qiang Liu,et al.  Distributed Estimation, Information Loss and Exponential Families , 2014, NIPS.

[19]  Frank Nielsen,et al.  Combinatorial bounds on the α-divergence of univariate mixture models , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Frank Nielsen,et al.  A family of statistical symmetric divergences based on Jensen's inequality , 2010, ArXiv.

[21]  Frank Nielsen,et al.  Introduction to HPC with MPI for Data Science , 2016, Undergraduate Topics in Computer Science.

[22]  T. Aaron Gulliver,et al.  Confliction of the Convexity and Metric Properties in f-Divergences , 2007, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[23]  Frank Nielsen WHAT IS… an Information Projection? , 2018 .

[24]  Keisuke Yamazaki,et al.  Kullback Information of Normal Mixture is not an Analytic Function , 2004 .