Information Theoretic Angle-Based Spectral Clustering: A Theoretical Analysis and an Algorithm

Recent work has revealed a close connection between certain information theoretic divergence measures and properties of Mercer kernel feature spaces. Specifically, it has been proposed that an information theoretic measure may be used as a cost function for clustering in a kernel space, approximated by the spectral properties of the Laplacian matrix. In this paper we extend this result to other kernel matrices. We develop an algorithm for the actual clustering which is based on comparing angles between data points, and demonstrate that the proposed method performs equally good as a state-of-the art spectral clustering method. We point out some drawbacks of spectral clustering related to outliers, and suggest measures to be taken.

[1]  Sudeep Sarkar,et al.  Supervised Learning of Large Perceptual Organization: Graph Spectral Partitioning and Learning Automata , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Deniz Erdoğmuş,et al.  Towards a unification of information theoretic learning and kernel methods , 2004, Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004..

[4]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[5]  Pietro Perona,et al.  A Factorization Approach to Grouping , 1998, ECCV.

[6]  Yoshua Bengio,et al.  Spectral Clustering and Kernel PCA are Learning Eigenfunctions , 2003 .

[7]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[8]  A. Rényi On Measures of Entropy and Information , 1961 .

[9]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[10]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[13]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[14]  Robert Jenssen,et al.  The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space , 2004, NIPS.

[15]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[16]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[17]  Deniz Erdogmus,et al.  Convergence properties and data efficiency of the minimum error entropy criterion in ADALINE training , 2003, IEEE Trans. Signal Process..

[18]  Mark Girolami,et al.  Orthogonal Series Density Estimation and the Kernel Eigenvalue Problem , 2002, Neural Computation.

[19]  Deniz Erdoğmuş INFORMATION THEORETIC LEARNING: RENYI'S ENTROPY AND ITS APPLICATIONS TO ADAPTIVE SYSTEM TRAINING , 2002 .

[20]  Andrew B. Kahng,et al.  Fast spectral methods for ratio cut partitioning and clustering , 1991, 1991 IEEE International Conference on Computer-Aided Design Digest of Technical Papers.

[21]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[22]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[23]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[24]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[25]  J. Mercer Functions of positive and negative type, and their connection with the theory of integral equations , 1909 .

[26]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[27]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[28]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .