A framework for experimental evaluation of clustering techniques

Experimental evaluation of clustering techniques for component recovery is necessary in order to analyze their strengths and weaknesses in comparison to other techniques. For comparable evaluations of automatic clustering techniques, a common reference corpus of freely available systems is needed for which the actual components are known. The reference corpus is used to measure recall and precision of automatic techniques. For this measurement, a standard scheme for comparing the components recovered by a clustering technique to components in the reference corpus is required. This paper describes both the process of setting up reference corpora and ways of measuring recall and precision of automatic clustering techniques. For methods with human intervention, controlled experiments should be conducted. This paper additionally proposes a controlled experiment as a standard for evaluating manual and semi-automatic component recovery methods that can be conducted cost-effectively.

[1]  Gerardo Canfora,et al.  An improved algorithm for identifying objects in code , 1996 .

[2]  Doris L. Carver,et al.  A graph-based object identification process for procedural programs , 1998, Proceedings Fifth Working Conference on Reverse Engineering (Cat. No.98TB100261).

[3]  Hausi A. Müller,et al.  A reverse-engineering approach to subsystem structure identification , 1993, J. Softw. Maintenance Res. Pract..

[4]  N. Wilde,et al.  Identifying objects in a conventional procedural language: an example of data design recovery , 1990, Proceedings. Conference on Software Maintenance 1990.

[5]  Arun Lakhotia,et al.  Toward experimental evaluation of subsystem classification recovery techniques , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[6]  Song C. Choi,et al.  Extracting and restructuring the design of large systems , 1990, IEEE Software.

[7]  Gregor Snelting,et al.  Assessing Modular Structure of Legacy Code Based on Mathematical Concept Analysis , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[8]  Richard C. Holt,et al.  MoJo: a distance metric for software clusterings , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[9]  Arun Lakhotia,et al.  A Unified Framework For Expressing Software Subsystem Classification Techniques , 1997, J. Syst. Softw..

[10]  William C. Chu,et al.  A measure for composite module cohesion , 1992, International Conference on Software Engineering.

[11]  D. R. Harris,et al.  Recovering abstract data types and object instances from a conventional procedural language , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[12]  E. Pitman Significance Tests Which May be Applied to Samples from Any Populations , 1937 .

[13]  B. J. Winer Statistical Principles in Experimental Design , 1992 .

[14]  Norman Wilde,et al.  An object finder for program structure understanding in software maintenance , 1994, J. Softw. Maintenance Res. Pract..

[15]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[16]  D. V. Lindley,et al.  Handbook of Statistical Tables. , 1962 .

[17]  Jean-Francois Girard,et al.  Finding components in a hierarchy of modules: a step towards architectural understanding , 1997, 1997 Proceedings International Conference on Software Maintenance.

[18]  Jean-Francois Girard,et al.  A comparison of abstract data types and objects recovery techniques , 2000, Sci. Comput. Program..

[19]  Rainer Koschke An incremental semi-automatic method for component recovery , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[20]  Robert W. Schwanke,et al.  An intelligent tool for re-engineering software modularity , 1991, [1991 Proceedings] 13th International Conference on Software Engineering.

[21]  J. Bortz,et al.  Verteilungsfreie Methoden in der Biostatistik , 1982 .

[22]  Houari A. Sahraoui,et al.  Applying concept formation methods to object identification in procedural code , 1997, Proceedings 12th IEEE International Conference Automated Software Engineering.

[23]  Hausi A. Müller,et al.  A reverse engineering environment based on spatial and visual software interconnection models , 1992 .

[24]  Laszlo A. Belady,et al.  System partitioning and its measure , 1981, J. Syst. Softw..

[25]  Thomas W. Reps,et al.  Identifying Modules via Concept Analysis , 1999, IEEE Trans. Software Eng..

[26]  R. Kirk Experimental Design: Procedures for the Behavioral Sciences , 1970 .

[27]  Victor R. Basili,et al.  System Structure Analysis: Clustering with Data Bindings , 1985, IEEE Transactions on Software Engineering.

[28]  E. Walter,et al.  Lienert, G. A.: Verteilungsfreie Methoden in der Biostatistik. Verlag Anton Hain, Meisenheim am Glan 1962; X + 361 S., DM 39,50 , 1964 .

[29]  Spiros Mancoridis,et al.  Automatic clustering of software systems using a genetic algorithm , 1999, STEP '99. Proceedings Ninth International Workshop Software Technology and Engineering Practice.

[30]  D. G. Beech,et al.  Handbook of Statistical Tables. , 1962 .

[31]  Harald C. Gall,et al.  Finding objects in procedural programs: an alternative approach , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[32]  E. J. G. Pitman,et al.  Significance Tests Which May be Applied to Samples from Any Populations. II. The Correlation Coefficient Test , 1937 .