论文信息 - k-Attractors: A Clustering Algorithm for Software Measurement Data Analysis

k-Attractors: A Clustering Algorithm for Software Measurement Data Analysis

Clustering is particularly useful in problems where there is little prior information about the data under analysis. This is usually the case when attempting to evaluate a software system's maintainability, as many dimensions must be taken into account in order to reach a conclusion. On the other hand partitional clustering algorithms suffer from being sensitive to noise and to the initial partitioning. In this paper we propose a novel partitional clustering algorithm, k-Attractors. It employs the maximal frequent itemset discovery and partitioning in order to define the number of desired clusters and the initial cluster attractors. Then it utilizes a similarity measure which is adapted to the way initial attractors are determined. We apply the k-Attractors algorithm to two custom industrial systems and we compare it with WEKA 's implementation of K-Means. We present preliminary results that show our approach is better in terms of clustering accuracy and speed.

[1] Vipin Kumar,et al. Clustering Based On Association Rule Hypergraphs , 1997, DMKD.

[2] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[3] Christos Makris,et al. Mining source code elements for comprehending object-oriented systems and evaluating their maintainability , 2006, SKDD.

[4] Benjamin C. M. Fung,et al. Hierarchical Document Clustering using Frequent Itemsets , 2003, SDM.

[5] Carl G. Davis,et al. A Hierarchical Model for Object-Oriented Design Quality Assessment , 2002, IEEE Trans. Software Eng..

[6] Ling Zhuang,et al. A maximal frequent itemset approach for Web document clustering , 2004, The Fourth International Conference onComputer and Information Technology, 2004. CIT '04..

[7] Ke Wang,et al. Clustering transactions using large items , 1999, CIKM '99.

[8] Chris F. Kemerer,et al. A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..