Evaluation methods for Web application clustering

Clustering of the entities composing a Web application (static and dynamic pages) can be used to support program understanding, However, several alternative options are available when a clustering technique is designed for Web applications. The entities to be clustered can be described in different ways (e.g., by their structure, by their connectivity, or by their content), different similarity measures are possible, and alternative procedures can be used to form the clusters. The problem is how to evaluate the competing clustering techniques in order to select the best for program understanding purposes. In this paper, two methods for clustering evaluation are considered, the gold standard and the task oriented approach. The advantages and disadvantages of both of them are analyzed in detail. Definition of a gold standard (reference clustering) is difficult and prone to subjectivity. On the other side, an evaluation based on the level of support given to task execution is expensive and requires careful experimental design. Guidelines and examples are provided for the implementation of both methods.

[1]  Nicolas Anquetil,et al.  Experiments with clustering as a software remodularization method , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[2]  Paolo Tonella,et al.  Using clustering to support the migration from static to dynamic web pages , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[3]  Arun Lakhotia,et al.  Toward experimental evaluation of subsystem classification recovery techniques , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[4]  Richard C. Holt,et al.  On the stability of software clustering algorithms , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[5]  T. A. Wiggerts,et al.  Using clustering algorithms in legacy systems remodularization , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[6]  Mikhail J. Atallah,et al.  Algorithms and Theory of Computation Handbook , 2009, Chapman & Hall/CRC Applied Algorithms and Data Structures series.

[7]  Richard C. Holt,et al.  MoJo: a distance metric for software clusterings , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[8]  Ivar Jacobson,et al.  The Unified Modeling Language User Guide , 1998, J. Database Manag..

[9]  Rainer Koschke,et al.  A framework for experimental evaluation of clustering techniques , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[10]  Mark Harman,et al.  A New Representation And Crossover Operator For Search-based Optimization Of Software Modularization , 2002, GECCO.

[11]  Shari Lawrence Pfleeger,et al.  Experimental design and analysis in software engineering , 1995, Ann. Softw. Eng..

[12]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[13]  Massimiliano Di Penta,et al.  An approach to identify duplicated web pages , 2002, Proceedings 26th Annual International Computer Software and Applications.

[14]  John Davey,et al.  Evaluating the suitability of data clustering for software remodularisation , 2000, Proceedings Seventh Working Conference on Reverse Engineering.

[15]  Giuseppe A. Di Lucca,et al.  Comprehending Web applications by a clustering based approach , 2002, Proceedings 10th International Workshop on Program Comprehension.

[16]  Anneliese Amschler Andrews,et al.  Comprehension processes during large scale maintenance , 1994, Proceedings of 16th International Conference on Software Engineering.

[17]  Jim Conallen,et al.  Building Web applications with UML , 1999 .

[18]  Spiros Mancoridis,et al.  CRAFT: a framework for evaluating software clustering results in the absence of benchmark decompositions [Clustering Results Analysis Framework and Tools] , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[19]  Spiros Mancoridis,et al.  Comparing the decompositions produced by software clustering algorithms using similarity measurements , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.