Enhancing Navigability in Websites Built Using Web Content Management Systems

Websites built using Web Content Management Systems (WCMSs) usually provide their users with three types of access structures to surf their contents: indexes of categories, breadcrumb trails, and sitemaps. In addition, to find contents of his/her interest, a user can perform more or less advanced full-text searches. In this paper we propose an automatic approach to extend the navigation structure of websites developed using WCMSs with Semantic Navigation Maps (SNMs), a complementary navigation structure that enables linking and navigating contents based on their lexical similarity. The approach uses an information retrieval technique (namely, Latent Semantic Indexing) to identify lexical similarities between textual contents, and a fuzzy clustering algorithm to form groups of similar web pages. For each page of the website, a set of navigation links towards pages showing similar content and a measure of such similarity is provided. The paper presents the approach to generate SNMs, an implementation for the Joomla! open source WCMS, and the results of an empirical evaluation involving two real world websites built using this WCMS.

[1]  Bogdan Dit,et al.  Integrating information retrieval, execution and link analysis algorithms to improve feature location in software , 2012, Empirical Software Engineering.

[2]  Rudolf Ferenc,et al.  Using the Conceptual Cohesion of Classes for Fault Prediction in Object-Oriented Systems , 2008, IEEE Transactions on Software Engineering.

[3]  Massimiliano Di Penta,et al.  An approach to identify duplicated web pages , 2002, Proceedings 26th Annual International Computer Software and Applications.

[4]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[5]  Giuseppe Scanniello,et al.  Extending web content management systems navigation capabilities with semantic navigation maps , 2010, 2010 12th IEEE International Symposium on Web Systems Evolution (WSE).

[6]  L. Guttman Some necessary conditions for common-factor analysis , 1954 .

[7]  Tomas Klos,et al.  Knowledge discovery in virtual community texts: Clustering virtual communities , 2003, J. Intell. Fuzzy Syst..

[8]  Paolo Tonella,et al.  Restructuring multilingual web sites , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[9]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[10]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[11]  Giuseppe Scanniello,et al.  Using Semantic clustering to enhance the navigation structure of Web sites , 2008, 2008 10th International Symposium on Web Site Evolution.

[12]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[13]  Fridolin Wild An LSA Package for R , 2007 .

[14]  Wei-Ying Ma,et al.  Improving text classification using local latent semantic indexing , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[15]  Donna K. Harman,et al.  Ranking Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[16]  Giuseppe Scanniello,et al.  Identifying Cloned Navigational Patterns in Web Applications , 2006, J. Web Eng..

[17]  Giuseppe Scanniello,et al.  A Probabilistic Based Approach towards Software System Clustering , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[18]  Giuseppe Scanniello,et al.  An approach and an Eclipse-based environment for enhancing the navigation structure of Web sites , 2009, International Journal on Software Tools for Technology Transfer.

[19]  Giuseppe Scanniello,et al.  Identifying similar pages in Web applications using a competitive clustering algorithm , 2007, J. Softw. Maintenance Res. Pract..

[20]  Peter Dolog,et al.  Engineering Web Applications , 2009, Data-Centric Systems and Applications.

[21]  Giuseppe Scanniello,et al.  A Multi-Objective Technique to Prioritize Test Cases Based on Latent Semantic Indexing , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[22]  Giuseppe Scanniello,et al.  Investigating the use of lexical information for software system clustering , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[23]  Giuseppe Scanniello,et al.  Clustering Support for Static Concept Location in Source Code , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[24]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[25]  Mária Bieliková,et al.  Automated Educational Course Metadata Generation Based on Semantics Discovery , 2009, EC-TEL.

[26]  Giuseppe Scanniello,et al.  An investigation of clustering algorithms in the identification of similar web pages , 2009 .

[27]  Stéphane Ducasse,et al.  Enriching reverse engineering with semantic clustering , 2005, 12th Working Conference on Reverse Engineering (WCRE'05).

[28]  Paolo Tonella,et al.  Using clustering to support the migration from static to dynamic web pages , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[29]  Paolo Tonella,et al.  Improving Web site understanding with keyword-based clustering , 2008, J. Softw. Maintenance Res. Pract..

[30]  Bogdan Dit,et al.  Feature location in source code: a taxonomy and survey , 2013, J. Softw. Evol. Process..

[31]  Giuseppe Scanniello,et al.  An Investigation of Clustering Algorithms in the Comprehension of Legacy Web Applications , 2009, J. Web Eng..

[32]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[33]  Preslav Nakov,et al.  Latent Semantic Analysis for German Literature Investigation , 2001, Fuzzy Days.

[34]  Xiaoming Jin,et al.  Understanding and Enhancing the Folding-In Method in Latent Semantic Indexing , 2006, DEXA.

[35]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[36]  Giuseppe Scanniello,et al.  Using fold-in and fold-out in the architecture recovery of software systems , 2011, Formal Aspects of Computing.

[37]  Tibor Gyimóthy,et al.  Using information retrieval based coupling measures for impact analysis , 2009, Empirical Software Engineering.