An Agglomerative Clustering Method for Large Data Sets

In Data Mining, agglomerative clustering algorithms are widely used because their flexibility and conceptual simplicity. However, their main drawback is their slowness. In this paper, a simple agglomerative clustering algorithm with a low computational complexity, is proposed. This method is especially convenient for performing clustering on large data sets, and could also be used as a linear time initialization method for other clustering algorithms, like the commonly used k-means algorithm. Experiments conducted on some standard data sets confirm that the proposed approach is effective. General Terms Clustering, Algorithms.

[1]  Jian-Fu Li,et al.  A Simple and Accurate Approach to Hierarchical Clustering , 2011 .

[2]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[3]  Deli Zhao,et al.  Agglomerative clustering via maximum incremental path integral , 2013, Pattern Recognit..

[4]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[5]  C.-C. Jay Kuo,et al.  A new initialization technique for generalized Lloyd iteration , 1994, IEEE Signal Processing Letters.

[6]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[7]  Deli Zhao,et al.  Cyclizing Clusters via Zeta Function of a Graph , 2008, NIPS.

[8]  Ting Su,et al.  In search of deterministic methods for initializing K-means and Gaussian mixture clustering , 2007, Intell. Data Anal..

[9]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[10]  Pasi Fränti,et al.  Fast PNN-based clustering using k-nearest neighbor graph , 2003, Third IEEE International Conference on Data Mining.

[11]  MuDer Jeng,et al.  Fast agglomerative clustering using information of k-nearest neighbors , 2010, Pattern Recognit..

[12]  Stuart A. Roberts,et al.  New methods for the initialisation of clusters , 1996, Pattern Recognit. Lett..

[13]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[14]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[15]  Minsu Cho,et al.  Feature correspondence and deformable object matching via agglomerative correspondence clustering , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Pasi Fränti,et al.  Fast Agglomerative Clustering Using a k-Nearest Neighbor Graph , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Stefanie Seiler,et al.  Finding Groups In Data , 2016 .

[18]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[19]  Deli Zhao,et al.  Graph Degree Linkage: Agglomerative Clustering on a Directed Graph , 2012, ECCV.