Sided and Symmetrized Bregman Centroids

In this paper, we generalize the notions of centroids (and barycenters) to the broad class of information-theoretic distortion measures called Bregman divergences. Bregman divergences form a rich and versatile family of distances that unifies quadratic Euclidean distances with various well-known statistical entropic measures. Since besides the squared Euclidean distance, Bregman divergences are asymmetric, we consider the left-sided and right-sided centroids and the symmetrized centroids as minimizers of average Bregman distortions. We prove that all three centroids are unique and give closed-form solutions for the sided centroids that are generalized means. Furthermore, we design a provably fast and efficient arbitrary close approximation algorithm for the symmetrized centroid based on its exact geometric characterization. The geometric approximation algorithm requires only to walk on a geodesic linking the two left/right-sided centroids. We report on our implementation for computing entropic centers of image histogram clusters and entropic centers of multivariate normal distributions that are useful operations for processing multimedia information and retrieval. These experiments illustrate that our generic methods compare favorably with former limited ad hoc methods.

[1]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[2]  Marc Teboulle,et al.  A Unified Continuous Optimization Framework for Center-Based Clustering Methods , 2007, J. Mach. Learn. Res..

[3]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[4]  Maya R. Gupta,et al.  Functional Bregman divergence , 2008, 2008 IEEE International Symposium on Information Theory.

[5]  S. Amari Integration of Stochastic Models by Minimizing -Divergence , 2007, Neural Computation.

[6]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[7]  Xin Guo,et al.  On the optimality of conditional expectation as a Bregman predictor , 2005, IEEE Trans. Inf. Theory.

[8]  M. Basseville,et al.  On entropies, divergences, and mean values , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.

[9]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[10]  P. Vos,et al.  Geometry of f-divergence , 1991 .

[11]  Frank Nielsen,et al.  Fitting the Smallest Enclosing Bregman Ball , 2005, ECML.

[12]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[13]  Axthonv G. Oettinger,et al.  IEEE Transactions on Information Theory , 1998 .

[14]  Minh N. Do,et al.  Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance , 2002, IEEE Trans. Image Process..

[15]  Eric P. Xing,et al.  Nonextensive entropic kernels , 2008, ICML '08.

[16]  Mark A. Clements,et al.  A Computationally Compact Divergence Measure for Speech Processing , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  J. Lafferty Additive models, boosting, and inference for generalized divergences , 1999, COLT '99.

[18]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[19]  R. Sibson Information radius , 1969 .

[20]  Heinz H. Bauschke,et al.  Legendre functions and the method of random Bregman projections , 1997 .

[21]  Igor Vajda,et al.  On Divergences and Informations in Statistics and Information Theory , 2006, IEEE Transactions on Information Theory.

[22]  Raymond N. J. Veldhuis,et al.  On the computation of the Kullback-Leibler measure for spectral distances , 2003, IEEE Trans. Speech Audio Process..

[23]  Jun Zhang,et al.  Divergence Function, Duality, and Convex Analysis , 2004, Neural Computation.

[24]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[25]  Matthew J. Katz,et al.  On the Fermat-Weber center of a convex object , 2005, Comput. Geom..

[26]  O. Barndorff-Nielsen Parametric statistical models and likelihood , 1988 .

[27]  Frank Nielsen,et al.  On the Smallest Enclosing Information Disk , 2008, CCCG.

[28]  Don H. Johnson,et al.  Symmetrizing the Kullback-Leibler Distance , 2001 .

[29]  Inderjit S. Dhillon,et al.  Differential Entropic Clustering of Multivariate Gaussians , 2006, NIPS.

[30]  Jorge Mateu,et al.  Quasi-arithmetic means of covariance functions with potential applications to space-time data , 2006, J. Multivar. Anal..

[31]  Richard Nock,et al.  Mixed Bregman Clustering with Approximation Guarantees , 2008, ECML/PKDD.

[32]  Agata Boratyńska,et al.  Stability of Bayesian inference in exponential families , 1997 .

[33]  Elena Deza,et al.  Dictionary of distances , 2006 .

[34]  Frank Nielsen,et al.  Tailored Bregman Ball Trees for Effective Nearest Neighbors , 2009 .

[35]  Yannis Stylianou,et al.  Perceptual and objective detection of discontinuities in concatenative speech synthesis , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[36]  Zhizhou Wang,et al.  DTI segmentation using an information theoretic tensor dissimilarity measure , 2005, IEEE Transactions on Medical Imaging.

[37]  Bruno Pelletier,et al.  Informative barycentres in statistics , 2005 .

[38]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[39]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[40]  Inderjit S. Dhillon,et al.  Matrix Nearness Problems with Bregman Divergences , 2007, SIAM J. Matrix Anal. Appl..

[41]  Jean-Claude Junqua,et al.  An optimal Bhattacharyya centroid algorithm for Gaussian clustering with applications in automatic speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[42]  I. Vajda,et al.  A new class of metric divergences on probability spaces and its applicability in statistics , 2003 .

[43]  Arshia Cont,et al.  Modeling musical anticipation: From the time of music to the music of time. (Modélisation de l'anticipation musicale: Du temps de la musique vers la musique du temps) , 2008 .

[44]  W. C. Graustein The geometry of Riemannian spaces , 1934 .

[45]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[46]  Shun-ichi Amari,et al.  Information Geometry and Its Applications: Convex Function and Dually Flat Manifold , 2009, ETVC.

[47]  R. Veldhuis The centroid of the symmetrical Kullback-Leibler distance , 2002, IEEE Signal Processing Letters.

[48]  Jerry D. Gibson,et al.  COMPARISON OF DISTANCE MEASURES IN DISCRETE SPECTRAL MODELING , 2000 .

[49]  I. Vajda,et al.  Convex Statistical Distances , 2018, Statistical Inference for Engineers and Data Scientists.

[50]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[51]  Suguru Arimoto,et al.  Information-Theoretical Considerations on Estimation Problems , 1971, Inf. Control..

[52]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[53]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[54]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[55]  Tomer Hertz,et al.  Learning distance functions for image retrieval , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[56]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[57]  Andrzej Stachurski,et al.  Parallel Optimization: Theory, Algorithms and Applications , 2000, Parallel Distributed Comput. Pract..

[58]  Chin-Hui Lee,et al.  A structural Bayes approach to speaker adaptation , 2001, IEEE Trans. Speech Audio Process..

[59]  Frank K. Soong,et al.  On divergence based clustering of normal distributions and its application to HMM adaptation , 2003, INTERSPEECH.

[60]  Jithendra Vepa,et al.  An Acoustic Model Based on Kullback-Leibler Divergence for Posterior Features , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[61]  Frank Nielsen,et al.  Bregman Voronoi Diagrams , 2007, Discret. Comput. Geom..

[62]  Baba C. Vemuri,et al.  Using the KL-center for efficient and accurate retrieval of distributions arising from texture images , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[63]  Robert Hermann,et al.  Geometry of Riemannian spaces , 1983 .

[64]  Timothy R. C. Read,et al.  Goodness-Of-Fit Statistics for Discrete Multivariate Data , 1988 .

[65]  C. R. Rao,et al.  On the convexity of some divergence measures based on entropy functions , 1982, IEEE Trans. Inf. Theory.

[66]  Dénes Petz,et al.  Means of Positive Numbers and Matrices , 2005, SIAM J. Matrix Anal. Appl..

[67]  Frank Nielsen,et al.  Quantum Voronoi diagrams and Holevo channel capacity for 1-qubit quantum states , 2008, 2008 IEEE International Symposium on Information Theory.

[68]  Michael L. Honig,et al.  Wiley Series in Telecommunications and Signal Processing , 2009 .

[69]  Shiri Gordon,et al.  An efficient image similarity measure based on approximations of KL-divergence between two gaussian mixtures , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[70]  Frank Nielsen,et al.  Bregman Divergences and Surrogates for Learning , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  I. Csiszár Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems , 1991 .