Using clustering to support the migration from static to dynamic web pages

Web sites of the first generation consist typically of a set of purely static Web pages. Content and presentation are not separated, and a same page structure is replicated every time a similar organization of the information is devised. Such a practice poses several problems to the evolution of these sites. It is not easy to update the content, and each time the HTML structure is modified, the same changes have to be propagated to all replications. In this paper, an approach is proposed for the identification of the Web pages that are more amenable to be migrated into a dynamic version, in that they share a similar structure, filled in with a content organized according to a common scheme. Clustering is used for this purpose: a common template is extracted from the pages in the same cluster and the variable information of the pages matching the template is migrated to a database. A server side program extracts the requested information from the data base, and generates dynamically the HTML pages to be displayed in the browser.

[1]  Mikhail J. Atallah,et al.  Algorithms and Theory of Computation Handbook , 2009, Chapman & Hall/CRC Applied Algorithms and Data Structures series.

[2]  T. A. Wiggerts,et al.  Using clustering algorithms in legacy systems remodularization , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[3]  Cornelia Boldyreff,et al.  Reverse engineering to achieve maintainable WWW sites , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[4]  Mark Harman,et al.  A New Representation And Crossover Operator For Search-based Optimization Of Software Modularization , 2002, GECCO.

[5]  Giuseppe A. Di Lucca,et al.  Comprehending Web applications by a clustering based approach , 2002, Proceedings 10th International Workshop on Program Comprehension.

[6]  Emden R. Gansner,et al.  Bunch: a clustering tool for the recovery and maintenance of software system structures , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[7]  Paolo Tonella,et al.  Restructuring multilingual web sites , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[8]  Paolo Tonella,et al.  Web application transformations based on rewrite rules , 2002, Inf. Softw. Technol..

[9]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[10]  Massimiliano Di Penta,et al.  An approach to identify duplicated web pages , 2002, Proceedings 26th Annual International Computer Software and Applications.

[11]  Paolo Tonella,et al.  Analysis and testing of Web applications , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[12]  Nicolas Anquetil,et al.  Experiments with clustering as a software remodularization method , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).