CLUES: A non-parametric clustering method based on local shrinking

A novel non-parametric clustering method based on non-parametric local shrinking is proposed. Each data point is transformed in such a way that it moves a specific distance toward a cluster center. The direction and the associated size of each movement are determined by the median of its K-nearest neighbors. This process is repeated until a pre-defined convergence criterion is satisfied. The optimal value of the number of neighbors is determined by optimizing some commonly used index functions that measure the strengths of clusters generated by the algorithm. The number of clusters and the final partition are determined automatically without any input parameter except the stopping rule for convergence. Experiments on simulated and real data sets suggest that the proposed algorithm achieves relatively high accuracies when compared with classical clustering algorithms.

[1]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[2]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[3]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[4]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[5]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[6]  Hichem Frigui,et al.  A Robust Competitive Clustering Algorithm With Applications in Computer Vision , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Alan Agresti,et al.  The Measurement of Classification Agreement: An Adjustment to the Rand Statistic for Chance Agreement , 1984 .

[8]  Nancy E. Heckman,et al.  Estimating and depicting the structure of a distribution of random functions , 2002 .

[9]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[10]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[11]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[12]  Dorin Comaniciu,et al.  Mean shift analysis and applications , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[13]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[14]  R. Maronna,et al.  Multivariate Clustering Procedures with Variable Metrics , 1974 .

[15]  J. Wang,et al.  VQ-agglomeration: a novel approach to clustering , 2001 .

[16]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[17]  P. Groenen,et al.  Data analysis, classification, and related methods , 2000 .

[18]  Enrique H. Ruspini,et al.  Numerical methods for fuzzy clustering , 1970, Inf. Sci..

[19]  Ming-Yen Cheng,et al.  Calibrating the excess mass and dip tests of modality , 1998 .

[20]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[21]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[22]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[23]  Dorin Comaniciu,et al.  The Variable Bandwidth Mean Shift and Data-Driven Scale Selection , 2001, ICCV.

[24]  M. Rosenblatt,et al.  Multivariate k-nearest neighbor density estimates , 1979 .

[25]  Yoshiharu Sato,et al.  An Autonomous Clustering Technique , 2000 .

[26]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  G. W. Milligan,et al.  A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis. , 1986, Multivariate behavioral research.

[28]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[29]  Sukhamay Kundu,et al.  Gravitational clustering: a new approach based on the spatial distribution of the points , 1999, Pattern Recognit..

[30]  P. Hall,et al.  Data sharpening as a prelude to density estimation , 1999 .

[31]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Dorin Comaniciu,et al.  Real-time tracking of non-rigid objects using mean shift , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[33]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .