Mixed Bregman Clustering with Approximation Guarantees

Two recent breakthroughs have dramatically improved the scope and performance of k-means clustering: squared Euclidean seeding for the initialization step, and Bregman clustering for the iterative step. In this paper, we first unite the two frameworks by generalizing the former improvement to Bregman seeding-- a biased randomized seeding technique using Bregman divergences -- while generalizing its important theoretical approximation guarantees as well. We end up with a complete Bregman hard clustering algorithm integrating the distortion at hand in both the initialization and iterative steps. Our second contribution is to further generalize this algorithm to handle mixed Bregman distortions, which smooth out the asymetricity of Bregman divergences. In contrast to some other symmetrization approaches, our approach keeps the algorithm simple and allows us to generalize theoretical guarantees from regular Bregman clustering. Preliminary experiments show that using the proposed seeding with a suitable Bregman divergence can help us discover the underlying structure of the data.

[1]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[2]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 1999, COLT '99.

[3]  Richard Nock,et al.  On Bregman Voronoi diagrams , 2007, SODA '07.

[4]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[5]  R. Veldhuis The centroid of the symmetrical Kullback-Leibler distance , 2002, IEEE Signal Processing Letters.

[6]  Frank Nielsen,et al.  Visualizing bregman voronoi diagrams , 2007, SCG '07.

[7]  Marcel R. Ackermann,et al.  Clustering for metric and non-metric distance measures , 2008, SODA '08.

[8]  Koby Crammer,et al.  Learning from Multiple Sources , 2006, NIPS.

[9]  Elena Deza,et al.  Dictionary of distances , 2006 .

[10]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[11]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[12]  Frank Nielsen,et al.  Fitting the Smallest Enclosing Bregman Ball , 2005, ECML.

[13]  Andrew McGregor,et al.  Finding Metric Structure in Information Theoretic Clustering , 2008, COLT.

[14]  Xin Guo,et al.  On the optimality of conditional expectation as a Bregman predictor , 2005, IEEE Trans. Inf. Theory.

[15]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[16]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.