Unsupervised cluster discovery using statistics in scale space

This paper presents a method of the unsupervised discovery of valid clusters using statistics on the modes of the probability density function in scale space. First, a Gaussian scale-space theory is applied to the kernel density estimation to derive the hierarchical relationships among the modes of the probability density function in scale space. The data points are classified into clusters according to the mode hierarchy. Second, the algorithm of cluster discovery is presented. The valid clusters are discovered by testing whether each cluster is distinguishable from spurious clusters obtained from uniformly random points. The statistical hypothesis test for cluster discovery requires distribution forms of annihilation scales of the modes estimated from the uniformly random points. The distribution forms are experimentally shown to be unimodal. Finally, cluster discovery is demonstrated using synthetic data and benchmark data.

[1]  Michalis Vazirgiannis,et al.  Clustering validity checking methods: part II , 2002, SGMD.

[2]  Leslie Greengard,et al.  The Fast Gauss Transform , 1991, SIAM J. Sci. Comput..

[3]  Max A. Viergever,et al.  Scale Space Hierarchy , 2003, Journal of Mathematical Imaging and Vision.

[4]  Yee Leung,et al.  Clustering by Scale-Space Filtering , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Alan C. F. Colchester,et al.  Superficial and deep structure in linear diffusion scale space: isophotes, critical points and separatrices , 1995, Image Vis. Comput..

[6]  Miguel Á. Carreira-Perpiñán,et al.  On the Number of Modes of a Gaussian Mixture , 2003, Scale-Space.

[7]  Stephen J. Roberts,et al.  Parametric and non-parametric unsupervised cluster analysis , 1997, Pattern Recognit..

[8]  Michalis Vazirgiannis,et al.  Cluster validity methods: part I , 2002, SGMD.

[9]  Gautam Biswas,et al.  ITERATE: a conceptual clustering algorithm for data mining , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[10]  Petra Perner,et al.  Acquisition of Concept Descriptions by Conceptual Clustering , 2005, MLDM.

[11]  Peter Johansen,et al.  On the classification of toppoints in scale space , 2005, Journal of Mathematical Imaging and Vision.

[12]  A. Izenman Recent Developments in Nonparametric Density Estimation , 1991 .

[13]  Atsushi Imiya,et al.  Figure Field Analysis of Linear Scale-Space Image , 2005, Scale-Space.

[14]  N.-Y. Zhao,et al.  Theory on the method of determination of view-point and field of vision during observation and measurement of figure , 1985 .

[15]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[16]  Larry S. Davis,et al.  Improved fast gauss transform and efficient kernel density estimation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Joydeep Ghosh,et al.  Scale-based clustering using the radial basis function network , 1996, IEEE Trans. Neural Networks.

[18]  Atsushi Imiya,et al.  Critical Scale for Unsupervised Cluster Discovery , 2007, MLDM.

[19]  A. Izenman Review Papers: Recent Developments in Nonparametric Density Estimation , 1991 .

[20]  Atsushi Imiya,et al.  On the History of Gaussian Scale-Space Axiomatics , 1997, Gaussian Scale-Space Theory.

[21]  Nasser Kehtarnavaz,et al.  Determining number of clusters and prototype locations via multi-scale clustering , 1998, Pattern Recognit. Lett..

[22]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[23]  Luc Florack,et al.  The Topological Structure of Scale-Space Images , 2000, Journal of Mathematical Imaging and Vision.

[24]  Atsushi Imiya,et al.  Linear Scale-Space has First been Proposed in Japan , 1999, Journal of Mathematical Imaging and Vision.

[25]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[26]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Paul Bratley,et al.  Algorithm 659: Implementing Sobol's quasirandom sequence generator , 1988, TOMS.

[29]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[30]  Atsushi Imiya,et al.  Scale-Space Hierarchy of Singularities , 2005, DSSCV.

[31]  J. Weickert,et al.  Information Measures in Scale-Spaces , 1999, IEEE Trans. Inf. Theory.

[32]  Henri Maître,et al.  Kernel MDL to Determine the Number of Clusters , 2007, MLDM.

[33]  O. Mangasarian,et al.  Pattern Recognition Via Linear Programming: Theory and Application to Medical Diagnosis , 1989 .

[34]  M. C. Jones,et al.  A Brief Survey of Bandwidth Selection for Density Estimation , 1996 .

[35]  Atsushi Imiya,et al.  Gradient Structure of Image in Scale Space , 2007, Journal of Mathematical Imaging and Vision.

[36]  D. W. Scott,et al.  The Mode Tree: A Tool for Visualization of Nonparametric Density Features , 1993 .

[37]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[38]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[39]  J. Koenderink The structure of images , 2004, Biological Cybernetics.

[40]  Miguel Á. Carreira-Perpiñán,et al.  Fast nonparametric clustering with Gaussian blurring mean-shift , 2006, ICML.