论文信息 - Exposing the Hidden-Web Induced by Ajax

Exposing the Hidden-Web Induced by Ajax

AJAX is a very promising approach for improving rich interactivity and responsiveness of web applications. At the same time, AJAX techniques increase the totality of the hidden web by shattering the metaphor of a web ‘page’ upon which general search engines are based. This paper describes a technique for exposing the hidden web content behind AJAX by automatically creating a traditional multi-page instance. In particular we propose a method for crawling AJAX applications and building a state-flow graph modeling the various navigation paths and states within an AJAX application. This model is used to generate linked static HTML pages and a corresponding Sitemap. We present our tool called CRAWLJAX which implements the concepts discussed in this paper. Additionally, we present a case study in which we apply our approach to two AJAX applications and elaborate on the obtained results.

Ali Mesbah | A. Van Deursen

[1] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[2] Arie van Deursen,et al. Migrating Multi-page Web Applications to Single-page AJAX Interfaces , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[3] Jean Vanderdonckt,et al. Graceful degradation of user interfaces as a design method for multiplatform systems , 2004, IUI '04.

[4] B. Kitchenham,et al. Case Studies for Method and Tool Evaluation , 1995, IEEE Softw..

[5] Augusto de Carvalho Fontes,et al. SmartCrawl: a new strategy for the exploration of the hidden web , 2004, WIDM '04.

[6] Arie van Deursen,et al. An Architectural Style for Ajax , 2006, 2007 Working IEEE/IFIP Conference on Software Architecture (WICSA'07).

[7] George Young,et al. Accessibility for simple to moderate-complexity DHTML web sites , 2007, W4A '07.

[8] Jesse James Garrett. Ajax: A New Approach to Web Applications , 2007 .

[9] Mary Lou Soffa,et al. Coverage criteria for GUI testing , 2001, ESEC/FSE-9.

[10] Alberto H. F. Laender,et al. Automatic generation of agents for collecting hidden Web pages for data extraction , 2004, Data Knowl. Eng..

[11] Atif M. Memon,et al. GUI ripping: reverse engineering of graphical user interfaces for testing , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[12] Arie van Deursen,et al. Domain-specific languages: an annotated bibliography , 2000, SIGP.

[13] Sriram Raghavan,et al. Crawling the Hidden Web , 2001, VLDB.

[14] Petros Zerfos,et al. Downloading textual hidden web content through keyword queries , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[15] Jennifer Widom,et al. Change detection in hierarchically structured information , 1996, SIGMOD '96.

[16] Roy T. Fielding,et al. Principled design of the modern Web architecture , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[17] Juliana Freire,et al. An adaptive crawler for locating hidden-Web entry points , 2007, WWW '07.

[18] Anirban Dasgupta,et al. The discoverability of the web , 2007, WWW '07.