Using Semantic clustering to enhance the navigation structure of Web sites

This paper presents an automatic approach based on semantic clustering to enhance the navigation structure of Web sites. The approach extends the navigation structure of a Web site by introducing a set of links that enable the navigation from each page of the site to other pages showing similar or related content. The approach uses Latent Semantic Indexing to compute a dissimilarity measure between the pages of the site and a Graph-Theoretic clustering algorithm to group pages having similar or related content. The additional links connecting each page of the site to the others within the same cluster are dynamically injected into each page by using AJAX code. A prototype of a supporting tool and the results from a case study conducted to assess the feasibility of the approach are also presented.

[1]  LuciaAndrea De,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007 .

[2]  H. Kaiser The Application of Electronic Computers to Factor Analysis , 1960 .

[3]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[4]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[5]  Paolo Tonella,et al.  Improving Web site understanding with keyword-based clustering , 2008 .

[6]  Preslav Nakov,et al.  Latent Semantic Analysis for German Literature Investigation , 2001, Fuzzy Days.

[7]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[8]  Giuliano Antoniol,et al.  An approach for reverse engineering of web-based applications , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[9]  Shihong Huang,et al.  Web site evolution via transaction reengineering , 2004, Proceedings. Sixth IEEE International Workshop on Web Site Evolution.

[10]  Paolo Tonella,et al.  Restructuring multilingual web sites , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[11]  Cornelia Boldyreff,et al.  Web Site Evolution , 2004, J. Softw. Maintenance Res. Pract..

[12]  L. Guttman Some necessary conditions for common-factor analysis , 1954 .

[13]  Giuseppe Scanniello,et al.  Identifying Cloned Navigational Patterns in Web Applications , 2006, J. Web Eng..

[14]  Paolo Tonella,et al.  Using clustering to support the migration from static to dynamic web pages , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[15]  David B. Lowe,et al.  NavOptim Coding: Supporting Website Navigation Optimisation using Effort Minimisation , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[16]  Paolo Tonella,et al.  Improving Web site understanding with keyword-based clustering , 2008, J. Softw. Maintenance Res. Pract..

[17]  Franca Garzotto,et al.  On the Acceptability of Conceptual Design Models for Web Applications , 2003, ER.

[18]  Giuseppe Scanniello,et al.  Clustering Algorithms and Latent Semantic Indexing to Identify Similar Pages in Web Applications , 2007, 2007 9th IEEE International Workshop on Web Site Evolution.

[19]  Gustavo Rossi,et al.  An Object Oriented Approach to Web-Based Applications Design , 1998, Theory Pract. Object Syst..

[20]  Nora Koch,et al.  The Authoring Process of the UML-based Web Engineering Approach , 2000 .

[21]  Stefano Ceri,et al.  Web Modeling Language (WebML): a modeling language for designing Web sites , 2000, Comput. Networks.

[22]  Stéphane Ducasse,et al.  Enriching reverse engineering with semantic clustering , 2005, 12th Working Conference on Reverse Engineering (WCRE'05).

[23]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[24]  Cornelia Boldyreff,et al.  The evolution of Websites , 1999, Proceedings Seventh International Workshop on Program Comprehension.

[25]  Massimiliano Di Penta,et al.  An approach to identify duplicated web pages , 2002, Proceedings 26th Annual International Computer Software and Applications.

[26]  Donna K. Harman,et al.  Ranking Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[27]  Mario Luca Bernardi,et al.  Recovering conceptual models from web applications , 2006, SIGDOC '06.

[28]  Gustavo Rossi,et al.  Web Engineering , 2001, Lecture Notes in Computer Science.

[29]  Franca Garzotto Ubiquitous Web Applications , 2001, ADBIS.

[30]  Stan Jarzabek,et al.  An investigation of cloning in web applications , 2005, WWW '05.

[31]  Arturo Hernández Aguirre,et al.  An estimation distribution algorithm with the spearman's rank correlation index , 2008, GECCO '08.

[32]  Giuseppe Scanniello,et al.  Identifying similar pages in Web applications using a competitive clustering algorithm , 2007, J. Softw. Maintenance Res. Pract..

[33]  Genny Tortora,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007, TSEM.

[34]  Cornelia Boldyreff,et al.  Reverse engineering to achieve maintainable WWW sites , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[35]  G. Rossi,et al.  An Object Oriented Approach to Web-Based Application Design , 1998 .

[36]  Gustaf Neumann,et al.  Parameters driving effectiveness of automated essay scoring with LSA , 2005 .

[37]  Gustavo Rossi,et al.  A comprehensive design model for integrating business processes in web applications , 2007, Int. J. Web Eng. Technol..

[38]  Dave Crane,et al.  Ajax in Action , 2005 .

[39]  Tomás Isakowitz,et al.  RMM: a methodology for structured hypermedia design , 1995, CACM.

[40]  Paolo Tonella,et al.  Understanding and Restructuring Web Sites with ReWeb , 2001, IEEE Multim..

[41]  Filippo Lanubile,et al.  Function Clone Detection in Web Applications: A Semiautomated Approach , 2004, J. Web Eng..