Multivariate Saddle Point Detection for Statistical Clustering

Decomposition methods based on nonparametric density estimation define a cluster as the basin of attraction of a local maximum (mode) of the density function, with the cluster borders being represented by valleys surrounding the mode. To measure the significance of each delineated cluster we propose a test statistics that compares the estimated density of the mode with the estimated maximum density on the cluster boundary. While for a given kernel bandwidth the modes can be safely obtained by using the mean shift procedure, the detection of maximum density points on the cluster boundary (i.e., the saddle points) is not straightforward for multivariate data. We therefore develop a gradient-based iterative algorithm for saddle point detection and show its effectiveness in various data decomposition tasks. After finding the largest density saddle point associated with each cluster, we compute significance measures that allow formal hypothesis testing of cluster existence. The new statistical framework is extended and tested for the task of image segmentation.

[1]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[4]  Robert Gilmore,et al.  Catastrophe Theory for Scientists and Engineers , 1981 .

[5]  B. Silverman,et al.  Using Kernel Density Estimates to Investigate Multimodality , 1981 .

[6]  J. Hartigan Statistical theory in clustering , 1985 .

[7]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[8]  Alan L. Yuille,et al.  Scaling Theorems for Zero Crossings , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Andrew P. Witkin,et al.  Uniqueness of the Gaussian Kernel for Scale-Space Filtering , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[11]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[12]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[13]  Michael C. Minnotte,et al.  Nonparametric testing of the existence of modes , 1997 .

[14]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Stephen J. Roberts,et al.  Parametric and non-parametric unsupervised cluster analysis , 1997, Pattern Recognit..

[16]  Pietro Perona,et al.  A Factorization Approach to Grouping , 1998, ECCV.

[17]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[18]  Daniel P. Huttenlocher,et al.  Image segmentation using local variation , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[19]  Andrew W. Moore,et al.  Very Fast EM-Based Mixture Model Clustering Using Multiresolution Kd-Trees , 1998, NIPS.

[20]  Eric J. Pauwels,et al.  Finding Salient Regions in Images: Nonparametric Clustering for Image Segmentation and Grouping , 1999, Comput. Vis. Image Underst..

[21]  G. Henkelman,et al.  A dimer method for finding saddle points on high dimensional potential surfaces using only first derivatives , 1999 .

[22]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[23]  Dorin Comaniciu,et al.  Mean shift analysis and applications , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[24]  Yee Leung,et al.  Clustering by Scale-Space Filtering , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[26]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  G. Henkelman,et al.  A climbing image nudged elastic band method for finding saddle points and minimum energy paths , 2000 .

[28]  Harry Shum,et al.  Image segmentation by data driven Markov chain Monte Carlo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[29]  D. Comaniciu,et al.  The variable bandwidth mean shift and data-driven scale selection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[30]  Michael Werman,et al.  Self-Organization in Vision: Stochastic Clustering for Image Segmentation, Perceptual Grouping, and Image Database Organization , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  G. Henkelman,et al.  Methods for Finding Saddle Points and Minimum Energy Paths , 2002 .

[32]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..