Kernelized Infomax Clustering

We propose a simple information-theoretic approach to soft clustering based on maximizing the mutual information I(x,y) between the unknown cluster labels y and the training patterns x with respect to parameters of specifically constrained encoding distributions. The constraints are chosen such that patterns are likely to be clustered similarly if they lie close to specific unknown vectors in the feature space. The method may be conveniently applied to learning the optimal affinity matrix, which corresponds to learning parameters of the kernelized encoder. The procedure does not require computations of eigenvalues of the Gram matrices, which makes it potentially attractive for clustering large data sets.

[1]  Ralph Linsker,et al.  Towards an Organizing Principle for a Layered Perceptual Network , 1987, NIPS.

[2]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[3]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Nicolas Brunel,et al.  Mutual Information, Fisher Information, and Population Coding , 1998, Neural Computation.

[5]  J.C. Principe,et al.  A methodology for information theoretic feature extraction , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[6]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[7]  William M. Campbell,et al.  Mutual Information in Learning Feature Transformations , 2000, ICML.

[8]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[9]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[10]  Gal Chechik,et al.  Extracting Relevant Structures with Side Information , 2002, NIPS.

[11]  Michael I. Jordan,et al.  Learning Spectral Clustering , 2003, NIPS.

[12]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[13]  Inderjit S. Dhillon,et al.  Information theoretic clustering of sparse cooccurrence data , 2003, Third IEEE International Conference on Data Mining.

[14]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[15]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[16]  David Barber,et al.  Auxiliary Variational Information Maximization for Dimensionality Reduction , 2005, SLSFS.