Relational Web Wrapper: A Web Data Extraction Approach

The information over the internet is growing at rapid rate, so web data extraction systems are required to extract the required information. One such technique is web wrapper, which is a supervised learning approach in which a template (program) is developed by the programmer to extract some specific data. This research paper provides a web wrapper known as Relational Web Wrapper which extracts related information of the webpage. Finally the performance evaluation of this web wrapper on the basis of time to extract the data and accuracy provided are shown in results. The results shows this web wrapper provides efficient results.

[1]  Pierre Senellart,et al.  Automatic wrapper induction from hidden-web sources with domain knowledge , 2008, WIDM '08.

[2]  Nicholas Kushmerick,et al.  Wrapper Induction for Information Extraction , 1997, IJCAI.

[3]  Stephen Soderland,et al.  Learning to Extract Text-Based Information from the World Wide Web , 1997, KDD.

[4]  Nicholas Kushmerick,et al.  Adaptive Information Extraction: Core Technologies for Information Agents , 2003, AgentLink.

[5]  David Fisher,et al.  CRYSTAL: Inducing a Conceptual Dictionary , 1995, IJCAI.

[6]  Torsten Suel,et al.  Interactive wrapper generation with minimal user effort , 2006, WWW '06.

[7]  William W. Cohen,et al.  A flexible learning system for wrapping tables and lists in HTML documents , 2002, WWW.

[8]  Georg Gottlob,et al.  Visual Web Information Extraction with Lixto , 2001, VLDB.

[9]  Chun-Nan Hsu,et al.  Generating Finite-State Transducers for Semi-Structured Data Extraction from the Web , 1998, Inf. Syst..

[10]  Georg Gottlob,et al.  Declarative Information Extraction, Web Crawling, and Recursive Wrapping with Lixto , 2001, LPNMR.

[11]  Chia-Hui Chang,et al.  IEPAD: information extraction based on pattern discovery , 2001, WWW '01.

[12]  Fabio Ciravegna,et al.  Learning to Tag for Information Extraction from Text , 2000 .

[13]  Dayne Freitag,et al.  Information Extraction from HTML: Application of a General Machine Learning Approach , 1998, AAAI/IAAI.

[14]  Craig A. Knoblock,et al.  Hierarchical Wrapper Induction for Semistructured Information Sources , 2004, Autonomous Agents and Multi-Agent Systems.