Evaluating similarity measures for software decompositions

One of the central questions that a similarity measure for software decompositions has to address is whether to consider discrepancies in terms of the nodes of a particular decomposition, or assess similarity based on differences in clustering the edges of the system's dependency graph. We argue that considering nodes or edges in isolation is too one-sided. We outline shortcomings of previous approaches, and introduce the first dissimilarity measure that takes both nodes and edges into account. We also present experiments on real and synthetic data sets that illustrate the differences between various measures.

[1]  Richard C. Holt,et al.  MoJo: a distance metric for software clusterings , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[2]  Song C. Choi,et al.  Extracting and restructuring the design of large systems , 1990, IEEE Software.

[3]  Vassilios Tzerpos,et al.  Comprehension-driven software clustering , 2001 .

[4]  Spiros Mancoridis,et al.  Comparing the decompositions produced by software clustering algorithms using similarity measurements , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[5]  Hausi A. Müller,et al.  A reverse-engineering approach to subsystem structure identification , 1993, J. Softw. Maintenance Res. Pract..

[6]  Vassilios Tzerpos,et al.  An optimal algorithm for MoJo distance , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[7]  Rainer Koschke,et al.  A framework for experimental evaluation of clustering techniques , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[8]  Richard C. Holt,et al.  Linux as a case study: its extracted software architecture , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[9]  Nicolas Anquetil,et al.  Experiments with clustering as a software remodularization method , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[10]  Nicolas Anquetil,et al.  File clustering using naming conventions for legacy systems , 1997, CASCON.

[11]  Richard C. Holt,et al.  ACCD: an algorithm for comprehension-driven clustering , 2000, Proceedings Seventh Working Conference on Reverse Engineering.

[12]  Emden R. Gansner,et al.  Bunch: a clustering tool for the recovery and maintenance of software system structures , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[13]  Robert W. Schwanke,et al.  An intelligent tool for re-engineering software modularity , 1991, [1991 Proceedings] 13th International Conference on Software Engineering.

[14]  Arie van Deursen,et al.  Identifying objects using cluster and concept analysis , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[15]  Arun Lakhotia,et al.  Toward experimental evaluation of subsystem classification recovery techniques , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[16]  Periklis Andritsos,et al.  Software clustering based on information loss minimization , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..