Clustering with Minimum Spanning Trees

We propose two Euclidean minimum spanning tree based clustering algorithms — one a k-constrained, and the other an unconstrained algorithm. Our k-constrained clustering algorithm produces a k-partition of a set of points for any given k. The algorithm constructs a minimum spanning tree of a set of representative points and removes edges that satisfy a predefined criterion. The process is repeated until k clusters are produced. Our unconstrained clustering algorithm partitions a point set into a group of clusters by maximally reducing the overall standard deviation of the edges in the Euclidean minimum spanning tree constructed from a given point set, without prescribing the number of clusters. We present our experimental results comparing our proposed algorithms with k-means, X-means, CURE, Chameleon, and the Expectation-Maximization (EM) algorithm on both artificial data and benchmark data from the UCI repository. We also apply our algorithms to image color clustering and compare them with the standard m...

[1]  Venkatesan Guruswami,et al.  Clustering with qualitative information , 2005, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[2]  Niina Päivinen Clustering with a minimum spanning tree of scale-free-like structure , 2005, Pattern Recognit. Lett..

[3]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[4]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[5]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[6]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[7]  David Avis,et al.  Diameter partitioning , 1986, Discret. Comput. Geom..

[8]  David S. Johnson,et al.  The NP-Completeness Column: An Ongoing Guide , 1982, J. Algorithms.

[9]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  Ying Xu,et al.  A segmentation algorithm for noisy images: Design and evaluation , 1998, Pattern Recognit. Lett..

[12]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[13]  B. Boutsinas,et al.  Estimating the number of clusters using a windowing technique , 2006, Pattern Recognition and Image Analysis.

[14]  Craig Eldershaw,et al.  Cluster Analysis using Triangulation , 1997 .

[15]  J. Todd Book Review: Digital image processing (second edition). By R. C. Gonzalez and P. Wintz, Addison-Wesley, 1987. 503 pp. Price: £29.95. (ISBN 0-201-11026-1) , 1988 .

[16]  Ying Xu,et al.  2D image segmentation using minimum spanning trees , 1997, Image Vis. Comput..

[17]  Nicole Immorlica,et al.  Approximation, Randomization, and Combinatorial Optimization.. Algorithms and Techniques , 2003, Lecture Notes in Computer Science.

[18]  Tetsuo Asano,et al.  Clustering algorithms based on minimum and maximum spanning trees , 1988, SCG '88.

[19]  Y Xu,et al.  Minimum spanning trees for gene expression data clustering. , 2001, Genome informatics. International Conference on Genome Informatics.

[20]  Daniel P. Lopresti,et al.  Locating and Recognizing Text in WWW Images , 2000, Information Retrieval.

[21]  William K. Pratt,et al.  Digital image processing, 2nd Edition , 1991, A Wiley-Interscience publication.