Distribution Free Decomposition of Multivariate Data

We present a practical approach to nonparametric cluster analysis of large data sets. The number of clusters and the cluster centres are automatically derived by mode seeking with the mean shift procedure on a reduced set of points randomly selected from the data. The cluster boundaries are delineated using a k-nearest neighbour technique. The proposed algorithm is stable and efficient, a 10,000 point data set being decomposed in only a few seconds. Complex clustering examples and applications are discussed, and convergence of the gradient ascent mean shift procedure is demonstrated for arbitrary distribution and cardinality of the data.

[1]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[2]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[3]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[4]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[5]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[6]  Robert Sedgewick,et al.  Algorithms in C , 1990 .

[7]  D. W. Scott,et al.  Variable Kernel Density Estimation , 1992 .

[8]  Geoffrey C. Fox,et al.  Constrained Clustering as an Optimization Method , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Joaquín Fernández-Valdivia,et al.  A dynamic approach for clustering data , 1995, Signal Process..

[11]  Jeng-Shyang Pan,et al.  Fast clustering algorithms for vector quantization , 1996, Pattern Recognit..

[12]  Michel Herbin,et al.  A clustering method based on the estimation of the probability density function and on the skeleton by influence zones. Application to image processing , 1996, Pattern Recognit. Lett..

[13]  J. Simonoff Multivariate Density Estimation , 1996 .

[14]  Dorin Comaniciu,et al.  Robust analysis of feature spaces: color image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Kris Popat,et al.  Cluster-based probability model and its application to image and texture processing , 1997, IEEE Trans. Image Process..

[16]  Sameer A. Nene,et al.  A simple algorithm for nearest neighbor search in high dimensions , 1997 .

[17]  Dorin Comaniciu,et al.  Bimodal system for interactive indexing and retrieval of pathology images , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[18]  Gary R. Bradski,et al.  Real time face and object tracking as a component of a perceptual user interface , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[19]  Narendra Ahuja,et al.  Location- and Density-Based Hierarchical Clustering Using Similarity Analysis , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .