Comparing the decompositions produced by software clustering algorithms using similarity measurements

Decomposing source code components and relations into subsystem clusters is an active area of research. Numerous clustering approaches have been proposed in the reverse engineering literature, each one using a different algorithm to identify subsystems. Since different clustering techniques may not produce identical results when applied to the same system, mechanisms that can measure the extent of these differences are needed. Some work to measure the similarity between decompositions has been done, but this work considers the assignment of source code components to clusters as the only criterion for similarity. We argue that better similarity measurements can be designed if the relations between the components are considered. The authors propose two similarity measurements that overcome certain problems in existing measurements. We also provide some suggestions on how to identify and deal with source code components that tend to contribute to poor similarity results. We conclude by presenting experimental results, and by highlighting some of the benefits of our similarity measurements.

[1]  Jeffrey L. Korn,et al.  Chava: reverse engineering and tracking of Java applets , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[2]  Gregor Snelting,et al.  Assessing Modular Structure of Legacy Code Based on Mathematical Concept Analysis , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[3]  Victor R. Basili,et al.  System Structure Analysis: Clustering with Data Bindings , 1985, IEEE Transactions on Software Engineering.

[4]  Richard C. Holt,et al.  On the stability of software clustering algorithms , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[5]  Hausi A. Müller,et al.  A reverse-engineering approach to subsystem structure identification , 1993, J. Softw. Maintenance Res. Pract..

[6]  Rainer Koschke,et al.  A framework for experimental evaluation of clustering techniques , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[7]  Richard C. Holt,et al.  The Orphan Adoption problem in architecture maintenance , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[8]  Song C. Choi,et al.  Extracting and restructuring the design of large systems , 1990, IEEE Software.

[9]  Robert W. Schwanke,et al.  An intelligent tool for re-engineering software modularity , 1991, [1991 Proceedings] 13th International Conference on Software Engineering.

[10]  Nicolas Anquetil,et al.  Recovering software architecture from the names of source files , 1999, J. Softw. Maintenance Res. Pract..

[11]  Richard C. Holt,et al.  MoJo: a distance metric for software clusterings , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[12]  Emden R. Gansner,et al.  A C++ data model supporting reachability analysis and dead code detection , 1997, ESEC '97/FSE-5.

[13]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[14]  Linda M. Wills,et al.  Reverse Engineering , 1996, Springer US.

[15]  Emden R. Gansner,et al.  Bunch: a clustering tool for the recovery and maintenance of software system structures , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[16]  Arie van Deursen,et al.  Identifying objects using cluster and concept analysis , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[17]  Spiros Mancoridis,et al.  CRAFT: a framework for evaluating software clustering results in the absence of benchmark decompositions [Clustering Results Analysis Framework and Tools] , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[18]  Nicolas Anquetil,et al.  A comparison of graphs of concept for reverse engineering , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[19]  Nicolas Anquetil,et al.  Experiments with clustering as a software remodularization method , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[20]  Spiros Mancoridis,et al.  Automatic clustering of software systems using a genetic algorithm , 1999, STEP '99. Proceedings Ninth International Workshop Software Technology and Engineering Practice.