A method of two stage clustering using agglomerative hierarchical algorithms with one-pass k-means++ or k-median++

The aim of this paper is to propose a two-stage method of clustering in which the first stage uses one-pass k-median++ and the second stage uses an agglomerative hierarchical clustering. To handle medians in the second stage, we proposed two calculation methods. One method uses L1 distance as similarity. Another uses error of L1 distance like the Ward method. In this paper, we compared proposed method and a two-stage method of our study which uses k-means++ in the first stage to examine the effectiveness of L1 distance in two-stage methods. Numerical experiments have been done using two criteria: objective function values and the Rand index.

[1]  B. S. Everitt,et al.  Cluster analysis , 2014, Encyclopedia of Social Network Analysis and Mining.

[2]  Charu C. Aggarwal,et al.  An Introduction to Cluster Analysis , 2018, Data Clustering: Algorithms and Applications.

[3]  Sadaaki Miyamoto,et al.  A method of two-stage clustering with constraints using agglomerative hierarchical algorithm and one-pass k-means , 2012, The 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems.

[4]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[5]  Nir Ailon,et al.  Streaming k-means approximation , 2009, NIPS.

[6]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[7]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[8]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .