Information cut and information forces for clustering

We define an information-theoretic divergence measure between probability density functions (pdfs) that has a deep connection to the cut in graph-theory. This connection is revealed when the pdfs are estimated by the Parzen method with a Gaussian kernel. We refer to our divergence measure as the information cut. The information cut provides us with a theoretically sound criterion for cluster evaluation. In this paper we show that it can be used to merge clusters. The initial clusters are obtained based on the related concept of information forces. We create directed trees by selecting the predecessor of a node (pattern) according to the direction of the information force acting on the pattern. Each directed tree corresponds to a cluster, hence enabling us to obtain an initial partitioning of the data set. Subsequently, we utilize the information cut as a cluster evaluation function to merge clusters until the predefined number of clusters is reached. We demonstrate the performance of our novel information-theoretic clustering method when applied to both artificially created data and real data, with encouraging results.

[1]  N. Deo,et al.  Graph-theoretic algorithms for image segmentation , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[2]  Nello Cristianini,et al.  Spectral Kernel Methods for Clustering , 2001, NIPS.

[3]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[5]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  S. Roberts,et al.  Minimum entropy data partitioning , 1999 .

[7]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[8]  Michael Werman,et al.  Self-Organization in Vision: Stochastic Clustering for Image Segmentation, Perceptual Grouping, and Image Database Organization , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Rui J. P. de Figueiredo,et al.  A new neural network for cluster-detection-and-labeling , 1998, IEEE Trans. Neural Networks.

[10]  Robert Jenssen,et al.  Information Force Clustering Using Directed Trees , 2003, EMMCVPR.

[11]  WEI SHUNG CHUNG Support Vector Clustering for Web Usage Mining , 2002 .

[12]  Naftali Tishby,et al.  Data Clustering by Markovian Relaxation and the Information Bottleneck Method , 2000, NIPS.

[13]  José Carlos Príncipe,et al.  Information Theoretic Clustering , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  A. Rényi On Measures of Entropy and Information , 1961 .

[15]  Richard M. Leahy,et al.  An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Deniz Erdoğmuş,et al.  Clustering using Renyi's entropy , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[17]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[18]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[19]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.