Exposing the Hidden-Web Induced by Ajax

AJAX is a very promising approach for improving rich interactivity and responsiveness of web applications. At the same time, AJAX techniques increase the totality of the hidden web by shattering the metaphor of a web ‘page’ upon which general search engines are based. This paper describes a technique for exposing the hidden web content behind AJAX by automatically creating a traditional multi-page instance. In particular we propose a method for crawling AJAX applications and building a state-flow graph modeling the various navigation paths and states within an AJAX application. This model is used to generate linked static HTML pages and a corresponding Sitemap. We present our tool called CRAWLJAX which implements the concepts discussed in this paper. Additionally, we present a case study in which we apply our approach to two AJAX applications and elaborate on the obtained results.

[1]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[2]  Arie van Deursen,et al.  Migrating Multi-page Web Applications to Single-page AJAX Interfaces , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[3]  Jean Vanderdonckt,et al.  Graceful degradation of user interfaces as a design method for multiplatform systems , 2004, IUI '04.

[4]  B. Kitchenham,et al.  Case Studies for Method and Tool Evaluation , 1995, IEEE Softw..

[5]  Augusto de Carvalho Fontes,et al.  SmartCrawl: a new strategy for the exploration of the hidden web , 2004, WIDM '04.

[6]  Arie van Deursen,et al.  An Architectural Style for Ajax , 2006, 2007 Working IEEE/IFIP Conference on Software Architecture (WICSA'07).

[7]  George Young,et al.  Accessibility for simple to moderate-complexity DHTML web sites , 2007, W4A '07.

[8]  Jesse James Garrett Ajax: A New Approach to Web Applications , 2007 .

[9]  Mary Lou Soffa,et al.  Coverage criteria for GUI testing , 2001, ESEC/FSE-9.

[10]  Alberto H. F. Laender,et al.  Automatic generation of agents for collecting hidden Web pages for data extraction , 2004, Data Knowl. Eng..

[11]  Atif M. Memon,et al.  GUI ripping: reverse engineering of graphical user interfaces for testing , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[12]  Arie van Deursen,et al.  Domain-specific languages: an annotated bibliography , 2000, SIGP.

[13]  Sriram Raghavan,et al.  Crawling the Hidden Web , 2001, VLDB.

[14]  Petros Zerfos,et al.  Downloading textual hidden web content through keyword queries , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[15]  Jennifer Widom,et al.  Change detection in hierarchically structured information , 1996, SIGMOD '96.

[16]  Roy T. Fielding,et al.  Principled design of the modern Web architecture , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[17]  Juliana Freire,et al.  An adaptive crawler for locating hidden-Web entry points , 2007, WWW '07.

[18]  Anirban Dasgupta,et al.  The discoverability of the web , 2007, WWW '07.