A 2-phase frame-based knowledge extraction framework

We present an approach for extracting knowledge from natural language English texts where processing is decoupled in two phases. The first phase comprises several standard NLP tasks whose results are integrated in a single RDF graph of mentions. The second phase processes the mention graph with SPARQL-like mapping rules to produce a knowledge graph organized around semantic frames (i.e., prototypical descriptions of events and situations). The decoupling allows: (i) choosing different tools for the NLP tasks without affecting the remaining computation; (ii) combining the outputs of different NLP tasks in non-trivial ways, leveraging their integrated and coherent representation in a mention graph; and (iii) relating each piece of extracted knowledge to the mention(s) it comes from, leveraging the single RDF representation. We evaluate precision and recall of our approach on a gold standard, showing its competitiveness w.r.t. the state of the art. We also evaluate execution times and (sampled) accuracy on a corpus of 110K Wikipedia pages, showing the applicability of the approach on large corpora.

[1]  Isabelle Augenstein,et al.  LODifier: Generating Linked Data from Unstructured Text , 2012, ESWC.

[2]  Luciano Serafini,et al.  The KnowledgeStore: A Storage Framework for Interlinking Unstructured and Structured Knowledge , 2015, Int. J. Semantic Web Inf. Syst..

[3]  Georgios Paliouras,et al.  Ontology Population and Enrichment: State of the Art , 2011, Knowledge-Driven Multimedia Information Extraction and Ontology Evolution.

[4]  Egoitz Laparra,et al.  Predicate Matrix: extending SemLink through WordNet mappings , 2014, LREC.

[5]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[6]  Philipp Cimiano,et al.  Ontology learning and population from text - algorithms, evaluation and applications , 2006 .

[7]  Aldo Gangemi,et al.  A Comparison of Knowledge Extraction Tools for the Semantic Web , 2013, ESWC.

[8]  Michele Mostarda,et al.  Processing billions of RDF triples on a single machine using streaming and sorting , 2015, SAC.

[9]  Diego Reforgiato Recupero,et al.  A Machine Reader for the Semantic Web , 2013, International Semantic Web Conference.

[10]  Edward Curry,et al.  Representing Texts as Contextualized Entity-Centric Linked Data Graphs , 2013, 2013 24th International Workshop on Database and Expert Systems Applications.

[11]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[12]  Adam Pease,et al.  Towards a standard upper ontology , 2001, FOIS.

[13]  Terence Parsons,et al.  Events in the Semantics of English: A Study in Subatomic Semantics , 1990 .