Self Training Wrapper Induction with Linked Data

This work explores the usage of Linked Data for Web scale Information Extraction, with focus on the task of Wrapper Induction. We show how to effectively use Linked Data to automatically generate training material and build a self-trained Wrapper Induction method. Experiments on a publicly available dataset demonstrate that for covered domains, our method can achieve F measure of 0.85, which is a competitive result compared against a supervised solution.

[1]  Hector Garcia-Molina,et al.  Extracting structured data from Web pages , 2003, SIGMOD '03.

[2]  Craig A. Knoblock,et al.  Hierarchical Wrapper Induction for Semistructured Information Sources , 2004, Autonomous Agents and Multi-Agent Systems.

[3]  Jens Lehmann,et al.  DBpedia - A Linked Data Hub and Data Source for Web and Enterprise Applications , 2009 .

[4]  Timothy W. Finin,et al.  Using Linked Data to Interpret Tables , 2010, COLD.

[5]  Nicholas Kushmerick,et al.  Wrapper Induction for Information Extraction , 1997, IJCAI.

[6]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[7]  Qiang Hao,et al.  From one tree to a forest: a unified solution for structured web data extraction , 2011, SIGIR.

[8]  Wai Lam,et al.  Learning to Adapt Web Information Extraction Knowledge and Discovering New Attributes via a Bayesian Approach , 2010, IEEE Transactions on Knowledge and Data Engineering.

[9]  Isabelle Augenstein,et al.  Unsupervised wrapper induction using linked data , 2013, K-CAP.

[10]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[11]  Charles Schafer,et al.  Bootstrapping Information Extraction from Semi-structured Web Pages , 2008, ECML/PKDD.

[12]  Peter Mika,et al.  Entity Search Evaluation over Structured Web Data , 2011 .

[13]  Valter Crescenzi,et al.  Automatic information extraction from large websites , 2004, JACM.

[14]  Ravi Kumar,et al.  Automatic Wrappers for Large Scale Web Extraction , 2011, Proc. VLDB Endow..

[15]  Tobias Dönz Extracting Structured Data from Web Pages , 2003 .

[16]  Craig A. Knoblock,et al.  Active Learning with Strong and Weak Views: A Case Study on Wrapper Induction , 2003, IJCAI.