论文信息 - Client-side deep Web data extraction

Client-side deep Web data extraction

The problem of data extraction from the deep Web can be divided into two tasks: crawling the client-side and the server-side deep Web. The objective is to define an architecture and a set of related techniques to access the information placed in the client-side deep Web. This involves dealing with aspects such as JavaScript technology, nonstandard session maintenance mechanisms, client redirections, pop-up menus, etc. We use current browser APIs as building blocks and leverage them to implement novel crawling models and algorithms

M. Alvarez | A. Pan | J. Raposo | A. Vina

[1] Ángel Viña,et al. Semi-Automatic Wrapper Generation for Commercial Web Sources , 2002, Engineering Information Systems in the Internet Context.

[2] Martin Bergman,et al. The deep web:surfacing the hidden value , 2000 .

[3] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[4] Sriram Raghavan,et al. Crawling the Hidden Web , 2001, VLDB.

[5] Luis Gravano,et al. Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection , 2002, VLDB.

[6] B. Huberman,et al. The Deep Web : Surfacing Hidden Value , 2000 .