论文信息 - Early Steps Towards Web Scale Information Extraction with LODIE

Early Steps Towards Web Scale Information Extraction with LODIE

Information extraction (IE) is the technique for transforming unstructured textual data into structured representation that can be understood by machines. The exponential growth of the Web generates an exceptional quantity of data for which automatic knowledge capture is essential. This work describes the methodology for web scale information extraction in the LODIE project (linked open data information extraction) and highlights results from the early experiments carried out in the initial phase of the project. LODIE aims to develop information extraction techniques able to scale at web level and adapt to user information needs. The core idea behind LODIE is the usage of linked open data, a very large-scale information resource, as a ground-breaking solution for IE, which provides invaluable annotated data on a growing number of domains. This article has two objectives. First, describing the LODIE project as a whole and depicting its general challenges and directions. Second, describing some initial steps taken towards the general solution, focusing on a specific IE subtask, wrapper induction.

Ziqi Zhang | Anna Lisa Gentile | Fabio Ciravegna

[1] Ryan Gabbard,et al. Extreme Extraction – Machine Reading in a Week , 2011, EMNLP.

[2] Isabelle Augenstein,et al. Statistical Knowledge Patterns: Identifying Synonymous Relations in Large Linked Datasets , 2013, International Semantic Web Conference.

[3] Ted Pedersen,et al. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[4] Bo Zhang,et al. StatSnowball: a statistical approach to extracting entity relationships , 2009, WWW '09.

[5] Deborah L. McGuinness,et al. When owl: sameAs Isn't the Same: An Analysis of Identity in Linked Data , 2010, SEMWEB.

[6] Ziqi Zhang,et al. WIT: Web People Search Disambiguation using Random Walks , 2007, SemEval@ACL.

[7] Zhi-Hua Zhou,et al. Editing Training Data for kNN Classifiers with Neural Network Ensemble , 2004, ISNN.

[8] Luis Gravano,et al. Snowball: a prototype system for extracting relations from large text collections , 2001, SIGMOD '01.

[9] Matemática,et al. Society for Industrial and Applied Mathematics , 2010 .

[10] Nicholas Kushmerick,et al. Wrapper Induction for Information Extraction , 1997, IJCAI.

[11] Krisztian Balog,et al. Overview of the TREC 2010 Entity Track , 2010, TREC.