Methods for selecting and improving software clustering algorithms

Several software clustering algorithms have been proposed in the literature, each with its own strengths and weaknesses. Most of these algorithms have been applied to particular software systems with considerable success. However, no algorithm has been shown to be superior in all cases. As a result, selecting a software clustering algorithm that is best suited for a specific software system remains a hard question to answer. At the same time, improving the effectiveness of an existing algorithm is a time‐consuming process that would benefit from a methodology that allowed the early evaluation of an idea. In this paper, we first introduce a formal description template for software clustering algorithms. Based on this template, we propose a novel method for the selection of a software clustering algorithm for specific needs, as well as a method for software clustering algorithm improvement. The applicability and usefulness of the two methods introduced in this paper is demonstrated by applying them in four distinct case studies. Copyright © 2012 John Wiley & Sons, Ltd.

[1]  Vassilios Tzerpos,et al.  Evaluating similarity measures for software decompositions , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[2]  Ali Shokoufandeh,et al.  Spectral and meta-heuristic algorithms for software clustering , 2005, J. Syst. Softw..

[3]  Cyril S. Ku,et al.  Design Patterns , 2008, Wiley Encyclopedia of Computer Science and Engineering.

[4]  Richard C. Holt,et al.  The Orphan Adoption problem in architecture maintenance , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[5]  Richard C. Holt,et al.  MoJo: a distance metric for software clusterings , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[6]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[7]  Abdelwahab Hamou-Lhadj,et al.  Software Clustering Using Dynamic Analysis and Static Dependencies , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[8]  Ali Safari Mamaghani,et al.  Clustering of Software Systems Using New Hybrid Algorithms , 2009, 2009 Ninth IEEE International Conference on Computer and Information Technology.

[9]  Hausi A. Müller,et al.  A reverse-engineering approach to subsystem structure identification , 1993, J. Softw. Maintenance Res. Pract..

[10]  Taghi M. Khoshgoftaar,et al.  Analyzing software measurement data with clustering techniques , 2004, IEEE Intelligent Systems.

[11]  Richard C. Holt,et al.  Linux as a case study: its extracted software architecture , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[12]  Emden R. Gansner,et al.  Bunch: a clustering tool for the recovery and maintenance of software system structures , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[13]  Rainer Koschke,et al.  A framework for experimental evaluation of clustering techniques , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[14]  Richard C. Holt,et al.  Comparison of clustering algorithms in the context of software evolution , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[15]  Mark Shtern,et al.  Methods for selecting and improving software clustering algorithms , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[16]  R. W. Schwanke,et al.  Discovering, visualizing, and controlling software structure , 1989, IWSSD '89.

[17]  Chung-Horng Lung,et al.  Applications of clustering techniques to software partitioning, recovery and restructuring , 2004, J. Syst. Softw..

[18]  Mark Harman,et al.  An empirical study of the robustness of two module clustering fitness functions , 2005, GECCO '05.

[19]  Vassilios Tzerpos,et al.  An effectiveness measure for software clustering algorithms , 2004, Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004..

[20]  Arun Lakhotia,et al.  Toward experimental evaluation of subsystem classification recovery techniques , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[21]  Gregor Snelting,et al.  Assessing Modular Structure of Legacy Code Based on Mathematical Concept Analysis , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[22]  David Notkin,et al.  Software reflexion models: bridging the gap between source and high-level models , 1995, SIGSOFT FSE.

[23]  Spiros Mancoridis,et al.  CRAFT: a framework for evaluating software clustering results in the absence of benchmark decompositions [Clustering Results Analysis Framework and Tools] , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[24]  Nenad Medvidovic,et al.  Using software evolution to focus architectural recovery , 2006, Automated Software Engineering.

[25]  Periklis Andritsos,et al.  Software clustering based on information loss minimization , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[26]  Gabriele Bavota,et al.  A two-step technique for extract class refactoring , 2010, ASE.

[27]  Xin Yao,et al.  Software Module Clustering as a Multi-Objective Search Problem , 2011, IEEE Transactions on Software Engineering.

[28]  Jens Dietrich,et al.  Cluster analysis of Java dependency graphs , 2008, SoftVis '08.

[29]  Nicolas Anquetil,et al.  Experiments with clustering as a software remodularization method , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[30]  Xiaogang Wang,et al.  Clustering large software systems at multiple layers , 2007, Inf. Softw. Technol..

[31]  Renée J. Miller,et al.  LIMBO: Scalable Clustering of Categorical Data , 2004, EDBT.

[32]  Periklis Andritsos,et al.  Information-theoretic software clustering , 2005, IEEE Transactions on Software Engineering.

[33]  Jonathan I. Maletic,et al.  Automatic software clustering via Latent Semantic Analysis , 1999, 14th IEEE International Conference on Automated Software Engineering.

[34]  Victor R. Basili,et al.  System Structure Analysis: Clustering with Data Bindings , 1985, IEEE Transactions on Software Engineering.

[35]  Nicolas Anquetil,et al.  File clustering using naming conventions for legacy systems , 1997, CASCON.

[36]  Richard C. Holt,et al.  ACCD: an algorithm for comprehension-driven clustering , 2000, Proceedings Seventh Working Conference on Reverse Engineering.

[37]  Robert W. Schwanke,et al.  An intelligent tool for re-engineering software modularity , 1991, [1991 Proceedings] 13th International Conference on Software Engineering.

[38]  Song C. Choi,et al.  Extracting and restructuring the design of large systems , 1990, IEEE Software.

[39]  Vassilios Tzerpos,et al.  An optimal algorithm for MoJo distance , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[40]  Vassilios Tzerpos,et al.  Comprehension-driven software clustering , 2001 .

[41]  Hausi A. Müller,et al.  Composing subsystem structures using (k,2)-partite graphs , 1990, Proceedings. Conference on Software Maintenance 1990.

[42]  Spiros Mancoridis,et al.  Comparing the decompositions produced by software clustering algorithms using similarity measurements , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[43]  Mark Shtern,et al.  A framework for the comparison of nested software decompositions , 2004, 11th Working Conference on Reverse Engineering.