Preloading Browsers for Optimizing Automatic Access to Hidden Web: A Ranking-Based Repository Solution

As Web applications grow in terms of quantity and quality, different vertical solutions could make use of them as an important source of information. Nevertheless, obtaining information from web sources becomes a challenging issue because of their complex access due to the hypertext browsing paradigm, and HTML's semistructured format. Web Automation middleware navigates through web links and fills web forms in an automatic way, so to extract information from the Hidden Web. The main optimization parameter is the time required to navigate through the intermediate pages that lead to the desired results. This work proposes a technique which focuses on improving the browsing time by storing information from previous queries, and using it to preload an adequate subset of the navigational sequence on a specific browser, before the next sequence is launched. It also takes into account the most commonly used sequences, being the ones to be preloaded more often.

[1]  Alberto Pan,et al.  Adding physical optimization to cost models in information mediators , 2005, IEEE International Conference on e-Business Engineering (ICEBE'05).

[2]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[3]  Ángel Viña,et al.  Building the Architecture of A Statistics-Based Query Optimization Solution for Heterogeneous Mediators , 2004, iiWAS.

[4]  Ángel Viña,et al.  The Denodo Data Integration Platform , 2002, VLDB.

[5]  Jesse James Garrett Ajax: A New Approach to Web Applications , 2007 .

[6]  Ángel Viña,et al.  Semi-Automatic Wrapper Generation for Commercial Web Sources , 2002, Engineering Information Systems in the Internet Context.

[7]  Berthier A. Ribeiro-Neto,et al.  A brief survey of web data extraction tools , 2002, SGMD.

[8]  Ángel Viña,et al.  Automatic wrapper maintenance for semi-structured web sources using results from previous queries , 2005, SAC '05.

[9]  Sriram Raghavan,et al.  Crawling the Hidden Web , 2001, VLDB.

[10]  Alberto Pan,et al.  Automatically generating labeled examples for Web wrapper maintenance , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[11]  Martin Bergman,et al.  The deep web:surfacing the hidden value , 2000 .

[12]  Nicholas Kushmerick,et al.  Wrapper Induction for Information Extraction , 1997, IJCAI.

[13]  Hector Garcia-Molina,et al.  Extracting structured data from Web pages , 2003, SIGMOD '03.

[14]  B. Huberman,et al.  The Deep Web : Surfacing Hidden Value , 2000 .