Neighbor number, valley seeking and clustering

This paper proposes a novel nonparametric clustering algorithm capable of identifying shape-free clusters. This algorithm is based on a nonparametric estimation of the normalized density derivative (NDD) and the local convexity of the density distribution function, both of which are represented in a very concise form in terms of neighbor numbers. We use NDD to measure the dissimilarity between each pair of observations in a local neighborhood and to build a connectivity graph. Combined with the local convexity, this similarity measure can detect observations in local minima (valleys) of the density function, which separate observations in different major clusters. We demonstrate that this algorithm has a close relationship with the single-linkage hierarchical clustering and can be viewed as its extension. The performance of the algorithm is tested with both synthetic and real datasets. An example of color image segmentation is also given. Comparisons with several representative existing algorithms show that the proposed method can robustly identify major clusters even when there are complex configurations and/or large overlaps.

[1]  Narendra Ahuja,et al.  Location- and Density-Based Hierarchical Clustering Using Similarity Analysis , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  José Carlos Príncipe,et al.  Information Theoretic Clustering , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[5]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[6]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[7]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[8]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[9]  L. Buydens,et al.  Knn density-based clustering for high dimensional multispectral images , 2003, 2003 2nd GRSS/ISPRS Joint Workshop on Remote Sensing and Data Fusion over Urban Areas.

[10]  Shaoping Ma,et al.  Correlation-Based Web Document Clustering for Adaptive Web Interface Design , 2002, Knowledge and Information Systems.

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[13]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[14]  M. Amadasun,et al.  Low-level segmentation of multispectral images via agglomerative clustering of uniform neighbourhoods , 1988, Pattern Recognit..

[15]  Kohji Fukunaga,et al.  Introduction to Statistical Pattern Recognition-Second Edition , 1990 .

[16]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[17]  G. Krishna,et al.  Agglomerative clustering using the concept of mutual nearest neighbourhood , 1978, Pattern Recognit..

[18]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .