Bregman Divergences and Triangle Inequality

While Bregman divergences have been used for clustering and embedding problems in recent years, the facts that they are asymmetric and do not satisfy triangle inequality have been a major concern. In this paper, we investigate the relationship between two families of symmetrized Bregman divergences and metrics that satisfy the triangle inequality. The first family can be derived from any well-behaved convex function. The second family generalizes the Jensen-Shannon divergence, and can only be derived from convex functions with certain conditional positive definiteness structure. We interpret the required structure in terms of cumulants of infinitely divisible distributions, and related results in harmonic analysis. We investigate kmeans-type clustering problems using both families of symmetrized divergences, and give efficient algorithms for the same.

[1]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[2]  Frank Nielsen,et al.  Sided and Symmetrized Bregman Centroids , 2009, IEEE Transactions on Information Theory.

[3]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[4]  Anoop Cherian,et al.  Efficient similarity search for covariance matrices via the Jensen-Bregman LogDet Divergence , 2011, 2011 International Conference on Computer Vision.

[5]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[6]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[7]  C. Berg,et al.  Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions , 1984 .

[8]  J. L. Nolan Stable Distributions. Models for Heavy Tailed Data , 2001 .

[9]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[10]  K. Chung,et al.  Limit Distributions for Sums of Independent Random Variables. , 1955 .

[11]  Dominik Endres,et al.  A new metric for probability distributions , 2003, IEEE Transactions on Information Theory.

[12]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[13]  Ü Kuran,et al.  POTENTIAL THEORY ON LOCALLY COMPACT ABELIAN GROUPS , 1977 .

[14]  E. Lehmann,et al.  Testing Statistical Hypothesis. , 1960 .

[15]  I. J. Schoenberg,et al.  Metric spaces and positive definite functions , 1938 .

[16]  S. I. Karpushev Conditionally positive-definite functions on locally compact groups and the Levy-Khinchin formula , 1985 .

[17]  A. Devinatz The representation of functions as a Laplace-Stieltjes integrals , 1955 .

[18]  Andrzej Stachurski,et al.  Parallel Optimization: Theory, Algorithms and Applications , 2000, Parallel Distributed Comput. Pract..

[19]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[20]  M. Rao,et al.  Metrics defined by Bregman Divergences , 2008 .

[21]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[22]  Andrew McGregor,et al.  Finding Metric Structure in Information Theoretic Clustering , 2008, COLT.

[23]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[24]  佐藤 健一 Lévy processes and infinitely divisible distributions , 2013 .

[25]  Matthias Hein,et al.  Hilbertian Metrics and Positive Definite Kernels on Probability Measures , 2005, AISTATS.

[26]  T. Gneiting,et al.  Stationary covariances associated with exponentially convex functions , 2003 .

[27]  Lawrence Cayton,et al.  Fast nearest neighbor retrieval for bregman divergences , 2008, ICML '08.