The Burbea-Rao and Bhattacharyya Centroids

We study the centroid with respect to the class of information-theoretic Burbea-Rao divergences that generalize the celebrated Jensen-Shannon divergence by measuring the non-negative Jensen difference induced by a strictly convex and differentiable function. Although those Burbea-Rao divergences are symmetric by construction, they are not metric since they fail to satisfy the triangle inequality. We first explain how a particular symmetrization of Bregman divergences called Jensen-Bregman distances yields exactly those Burbea-Rao divergences. We then proceed by defining skew Burbea-Rao divergences, and show that skew Burbea-Rao divergences amount in limit cases to compute Bregman divergences. We then prove that Burbea-Rao centroids can be arbitrarily finely approximated by a generic iterative concave-convex optimization algorithm with guaranteed convergence property. In the second part of the paper, we consider the Bhattacharyya distance that is commonly used to measure overlapping degree of probability distributions. We show that Bhattacharyya distances on members of the same statistical exponential family amount to calculate a Burbea-Rao divergence in disguise. Thus we get an efficient algorithm for computing the Bhattacharyya centroid of a set of parametric distributions belonging to the same exponential families, improving over former specialized methods found in the literature that were limited to univariate or “diagonal” multivariate Gaussians. To illustrate the performance of our Bhattacharyya/Burbea-Rao centroid algorithm, we present experimental performance results for k-means and hierarchical clustering methods of Gaussian mixture models.

[1]  S. Kakutani On Equivalence of Infinite Product Measures , 1948 .

[2]  Shun-ichi Amari,et al.  $\alpha$ -Divergence Is Unique, Belonging to Both $f$-Divergence and Bregman Divergence Classes , 2009, IEEE Transactions on Information Theory.

[3]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[4]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[5]  Frank Nielsen,et al.  Bhattacharyya Clustering with Applications to Mixture Simplifications , 2010, 2010 20th International Conference on Pattern Recognition.

[6]  Jean-Claude Junqua,et al.  An optimal Bhattacharyya centroid algorithm for Gaussian clustering with applications in automatic speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[7]  Marcin Detyniecki,et al.  Mathematical Aggregation Operators and their Application to Video Querying , 2000 .

[8]  C. R. Rao,et al.  On the convexity of higher order Jensen differences based on entropy functions , 1982, IEEE Trans. Inf. Theory.

[9]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[10]  Gert R. G. Lanckriet,et al.  On the Convergence of the Concave-Convex Procedure , 2009, NIPS.

[11]  Frank Nielsen,et al.  Statistical exponential families: A digest with flash cards , 2009, ArXiv.

[12]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[13]  Andrzej Cichocki,et al.  Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.

[14]  Jean-Luc Marichal,et al.  Aggregation operators for multicriteria decision aid , 1998 .

[15]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[16]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[17]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[18]  M. Rao,et al.  Metrics defined by Bregman Divergences , 2008 .

[19]  C. R. Rao,et al.  On the convexity of some divergence measures based on entropy functions , 1982, IEEE Trans. Inf. Theory.

[20]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[21]  K. Matusita Decision Rules, Based on the Distance, for Problems of Fit, Two Samples, and Estimation , 1955 .

[22]  Peter Auer,et al.  Exponentially many local minima for single neurons , 1995, NIPS.

[23]  Andrzej Stachurski,et al.  Parallel Optimization: Theory, Algorithms and Applications , 2000, Parallel Distributed Comput. Pract..

[24]  J. D. Gorman,et al.  Alpha-Divergence for Classification, Indexing and Retrieval (Revised 2) , 2002 .

[25]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[26]  Richard Nock,et al.  On Bregman Voronoi diagrams , 2007, SODA '07.

[27]  E. Hayes Mean Values. , 2022, Science.

[28]  Neil A. Thacker,et al.  The Bhattacharyya metric as an absolute similarity measure for frequency coded data , 1998, Kybernetika.

[29]  M. Rao,et al.  Metrics defined by Bregman divergences: Part 2 , 2008 .

[30]  M. Basseville,et al.  On entropies, divergences, and mean values , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.

[31]  I. Csiszár Generalized projections for non-negative functions , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.

[32]  Frank Nielsen,et al.  Hierarchical Gaussian Mixture Model , 2010, ICASSP.

[33]  Mitio Nagumo Über eine Klasse der Mittelwerte , 1930 .

[34]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[35]  AmariShun-Ichi α-divergence is unique, belonging to both f-divergence and Bregman divergence classes , 2009 .

[36]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[37]  Ernst Hellinger,et al.  Die Orthogonalinvarianten quadratischer Formen von unendlichvielen Variabelen , 2022 .

[38]  Frank Nielsen,et al.  Skew Jensen-Bregman Voronoi Diagrams , 2011, Trans. Comput. Sci..

[39]  S. Amari Integration of Stochastic Models by Minimizing -Divergence , 2007, Neural Computation.

[40]  P. Vos,et al.  Geometry of f-divergence , 1991 .

[41]  Frank Nielsen,et al.  Sided and Symmetrized Bregman Centroids , 2009, IEEE Transactions on Information Theory.

[42]  Jun Zhang,et al.  Divergence Function, Duality, and Convex Analysis , 2004, Neural Computation.

[43]  Frank Nielsen,et al.  The Dual Voronoi Diagrams with Respect to Representational Bregman Divergences , 2009, 2009 Sixth International Symposium on Voronoi Diagrams.

[44]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[45]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[46]  Yun He,et al.  Information divergence measure for ISAR image registration , 2001, SPIE Defense + Commercial Sensing.

[47]  Igor Vajda,et al.  On Divergences and Informations in Statistics and Information Theory , 2006, IEEE Transactions on Information Theory.

[48]  H. Krim,et al.  An information divergence measure for ISAR image registration , 2001, Proceedings of the 11th IEEE Signal Processing Workshop on Statistical Signal Processing (Cat. No.01TH8563).