Bagging for Path-Based Clustering

A resampling scheme for clustering with similarity to bootstrap aggregation (bagging) is presented. Bagging is used to improve the quality of path-based clustering, a data clustering method that can extract elongated structures from data in a noise robust way. The results of an agglomerative optimization method are influenced by small fluctuations of the input data. To increase the reliability of clustering solutions, a stochastic resampling method is developed to infer consensus clusters. A related reliability measure allows us to estimate the number of clusters, based on the stability of an optimized cluster solution under resampling. The quality of path-based clustering with resampling is evaluated on a large image data set of human segmentations.

[1]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[2]  Michael Werman,et al.  Self-Organization in Vision: Stochastic Clustering for Image Segmentation, Perceptual Grouping, and Image Database Organization , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Jianbo Shi,et al.  Learning Segmentation by Random Walks , 2000, NIPS.

[4]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Ana L. N. Fred,et al.  Clustering under a hypothesis of smooth dissimilarity increments , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[6]  Joachim M. Buhmann,et al.  Histogram clustering for unsupervised segmentation and image retrieval , 1999, Pattern Recognit. Lett..

[7]  Joachim M. Buhmann,et al.  Path-Based Clustering for Grouping of Smooth Curves and Texture Segmentation , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[10]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[11]  Sandrine Dudoit,et al.  Applications of Resampling Methods to Estimate the Number of Clusters and to Improve the Accuracy of , 2001 .

[12]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Naftali Tishby,et al.  Data Clustering by Markovian Relaxation and the Information Bottleneck Method , 2000, NIPS.

[14]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[15]  Eytan Domany,et al.  Resampling Method for Unsupervised Estimation of Cluster Validity , 2001, Neural Computation.

[16]  Emanuele Trucco,et al.  Robust motion and correspondence of noisy 3-D point sets with missing data , 1999, Pattern Recognit. Lett..

[17]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[18]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[19]  David R. Karger,et al.  A new approach to the minimum cut problem , 1996, JACM.

[20]  David G. Stork,et al.  Pattern Classification , 1973 .

[21]  Eytan Domany,et al.  Data Clustering Using a Model Granular Magnet , 1997, Neural Computation.