A dependence maximization view of clustering

We propose a family of clustering algorithms based on the maximization of dependence between the input variables and their cluster labels, as expressed by the Hilbert-Schmidt Independence Criterion (HSIC). Under this framework, we unify the geometric, spectral, and statistical dependence views of clustering, and subsume many existing algorithms as special cases (e.g. k-means and spectral clustering). Distinctive to our framework is that kernels can also be applied on the labels, which can endow them with particular structures. We also obtain a perturbation bound on the change in k-means clustering.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  Dana H. Ballard,et al.  Computer Vision , 1982 .

[3]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[5]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[6]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[8]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[9]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[10]  William Bialek,et al.  Entropy and Inference, Revisited , 2001, NIPS.

[11]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[12]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[13]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[14]  Marina Meila,et al.  Data centering in feature space , 2003, AISTATS.

[15]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[16]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[17]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[18]  Amnon Shashua,et al.  A unifying approach to hard and probabilistic clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[20]  Marina Meila,et al.  The uniqueness of a good optimum for K-means , 2006, ICML.

[21]  Kilian Q. Weinberger,et al.  An Introduction to Nonlinear Dimensionality Reduction by Maximum Variance Unfolding , 2006, AAAI.

[22]  U. Rothblum,et al.  Integer Convex Maximization , 2006 .

[23]  Stephen P. Boyd,et al.  A duality view of spectral methods for dimensionality reduction , 2006, ICML.

[24]  Noam Slonim,et al.  The Information Bottleneck : Theory and Applications , 2006 .