论文信息 - A Method of Two-Stage Clustering with Constraints Using Agglomerative Hierarchical Algorithm and One-Pass k-Means++

A Method of Two-Stage Clustering with Constraints Using Agglomerative Hierarchical Algorithm and One-Pass k-Means++

The aim of this paper is to propose a two-stage method of clustering in which the first stage uses a one-pass k-means++ and the second stage uses an agglomerative hierarchical algorithm. This method outperforms a foregoing two-stage algorithm by replacing the one-pass k-means by a one-pass k-means++ in the first stage. Pairwise constraints are also taken into consideration in order to improve its performance. Effectiveness of the proposed method is shown by numerical examples. 1 はじめにクラスタリング [11]はデータを外的基準なしに自動的にいくつかのグループに分類する手法である.また近年は,いくつかの個体のみに外的基準を与えて分類制度の向上を目指す,半教師付きクラスタリング [1]の研究も盛んになっている. クラスタリングは階層的方法と非階層的方法の 2つに大別することができる.階層的方法は計算量が大きく,大量のデータセットを扱いづらいという欠点がある.この欠点を軽減するために小原らによって二段階クラスタリングが提案された [5][6].これは第 1 段階で one-pass k-means[3] を行い,得たクラスター中心を第 2段階で階層的方法を用いて分類するというものである.しかしこの手法には,第一段階で用いている one-pass k-meansの初期値依存性という欠点がある. 本研究では第 1段階に one-pass k-means++[2]を用いることで,二段階クラスタリングの初期値依存性を軽減することを目的とする.また,分類率をさらに向上させるため半教師の 1つである対制約を導入し [7][9], その効果について考察する. 2 階層的クラスタリングまず階層的クラスタリングについて述べるため,個体の集合を X = {x1, · · · , xn}と定義し,クラスターの集合を G = {G1, G2, · · · , GC} と定義する.また xi(i = 1, · · · , n) は p 次元ユークリッド空間上の点 xi = (xi1, · · · , xip)とする.このとき

[1] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[2] Sadaaki Miyamoto,et al. A method of two-stage clustering with constraints using agglomerative hierarchical algorithm and one-pass k-means , 2012, The 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems.

[3] Sadaaki Miyamoto,et al. Constrained agglomerative hierarchical clustering algorithms with penalties , 2011, 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011).

[4] Raymond J. Mooney,et al. A probabilistic framework for semi-supervised clustering , 2004, KDD.

[5] Claire Cardie,et al. Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .