Exploiting Linked Open Data to Uncover Entity Types

Extracting structured information from text plays a crucial role in automatic knowledge acquisition and is at the core of any knowledge representation and reasoning system. Traditional methods rely on hand-crafted rules and are restricted by the performance of various linguistic pre-processing tools. More recent approaches rely on supervised learning of relations trained on labelled examples, which can be manually created or sometimes automatically generated (referred as distant supervision). We propose a supervised method for entity typing and alignment. We argue that a rich feature space can improve extraction accuracy and we propose to exploit Linked Open Data (LOD) for feature enrichment. Our approach is tested on task-2 of the Open Knowledge Extraction challenge, including automatic entity typing and alignment. Our approach demonstrate that by combining evidences derived from LOD (e.g. DBpedia) and conventional lexical resources (e.g. WordNet) (i) improves the accuracy of the supervised induction method and (ii) enables easy matching with the Dolce+DnS Ultra Lite ontology classes.

[1]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[2]  Kalina Bontcheva,et al.  Adapting SVM for data sparseness and imbalance: a case study in information extraction , 2009, Natural Language Engineering.

[3]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[4]  Lorena Otero-Cerdeira,et al.  Ontology matching: A literature review , 2015, Expert Syst. Appl..

[5]  Christian Jacquemin,et al.  EMPIRICAL OBSERVATION OF TERM VARIATIONS AND PRINCIPLES FOR THEIR DESCRIPTION , 1996 .

[6]  Fausto Giunchiglia,et al.  Discovering Missing Background Knowledge in Ontology Matching , 2006, ECAI.

[7]  Martin Gaedke,et al.  Silk - A Link Discovery Framework for the Web of Data , 2009, LDOW.

[8]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[9]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[10]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[11]  Oren Etzioni,et al.  Open question answering over curated and extracted knowledge bases , 2014, KDD.

[12]  Xuchen Yao,et al.  Information Extraction over Structured Data: Question Answering with Freebase , 2014, ACL.

[13]  Marouen Kachroudi,et al.  LDOA results for OAEI 2011 , 2011, OM.

[14]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[15]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[16]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[17]  Ralph Grishman,et al.  Ensemble Semantics for Large-scale Unsupervised Relation Extraction , 2012, EMNLP.