Enriching Structured Knowledge with Open Information

We propose an approach for semantifying web extracted facts. In particular, we map subject and object terms of these facts to instances; and relational phrases to object properties defined in a target knowledge base. By doing this we resolve the ambiguity inherent in the web extracted facts, while simultaneously enriching the target knowledge base with a significant number of new assertions. In this paper, we focus on the mapping of the relational phrases in the context of the overall work ow. Furthermore, in an open extraction setting identical semantic relationships can be represented by different surface forms, making it necessary to group these surface forms together. To solve this problem we propose the use of markov clustering. In this work we present a complete, ontology independent, generalized workflow which we evaluate on facts extracted by Nell and Reverb. Our target knowledge base is DBpedia. Our evaluation shows promising results in terms of producing highly precise facts. Moreover, the results indicate that the clustering of relational phrases pays of in terms of an improved instance and property mapping.

[1]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[2]  S. Niwattanakul,et al.  Using of Jaccard Coefficient for Keywords Similarity , 2022 .

[3]  Isabelle Augenstein,et al.  Relation Extraction from the Web Using Distant Supervision , 2014, EKAW.

[4]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[5]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[6]  Yang Chen,et al.  Web-Scale Knowledge Inference Using Markov Logic Networks , 2013 .

[7]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[8]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[9]  Jonathan Weese,et al.  UMBC_EBIQUITY-CORE: Semantic Textual Similarity Systems , 2013, *SEMEVAL.

[10]  Gerhard Weikum,et al.  PATTY: A Taxonomy of Relational Patterns with Semantic Types , 2012, EMNLP.

[11]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[12]  Anna-Lan Huang,et al.  Similarity Measures for Text Document Clustering , 2008 .

[13]  Roberto Navigli,et al.  Integrating Syntactic and Semantic Analysis into the Open Information Extraction Paradigm , 2013, IJCAI.

[14]  Diego Reforgiato Recupero,et al.  Uncovering the Semantics of Wikipedia Pagelinks , 2014, EKAW.

[15]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[16]  Heiner Stuckenschmidt,et al.  Semantifying Triples from Open Information Extraction Systems , 2014, STAIRS.

[17]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[18]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[19]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[20]  Stephen Soderland,et al.  Moving from Textual Relations to Ontologized Relations , 2007, AAAI Spring Symposium: Machine Reading.

[21]  Lise Getoor,et al.  Large-Scale Knowledge Graph Identification Using PSL , 2013, AAAI Fall Symposia.

[22]  Evgeniy Gabrilovich,et al.  Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge , 2006, AAAI.

[23]  Ralph Grishman,et al.  Semi-supervised Semantic Pattern Discovery with Guidance from Unsupervised Pattern Clusters , 2010, COLING.

[24]  Christopher Leckie,et al.  An Evaluation of Criteria for Measuring the Quality of Clusters , 1999, IJCAI.

[25]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[26]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[27]  Oren Etzioni,et al.  Adapting Open Information Extraction to Domain-Specific Relations , 2010, AI Mag..

[28]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[29]  Patrick Pantel,et al.  DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[30]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[31]  Dominic Widdows,et al.  Using Curvature and Markov Clustering in Graphs for Lexical Acquisition and Word Sense Discrimination , 2004 .

[32]  S. Dongen Graph clustering by flow simulation , 2000 .

[33]  Timothy W. Finin,et al.  Meerkat Mafia: Multilingual and Cross-Level Semantic Textual Similarity Systems , 2014, SemEval@COLING.

[34]  Simone Paolo Ponzetto,et al.  A Probabilistic Approach for Integrating Heterogeneous Knowledge Sources , 2014, ESWC.

[35]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[36]  Kun Li,et al.  Automatic Knowledge Base Construction using Probabilistic Extraction, Deductive Reasoning, and Human Feedback , 2012, AKBC-WEKEX@NAACL-HLT.

[37]  Gerhard Weikum,et al.  SOFIE: a self-organizing framework for information extraction , 2009, WWW '09.

[38]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[39]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[40]  Benjamin Van Durme,et al.  Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs , 2008, ACL.

[41]  Serge Abiteboul,et al.  PARIS: Probabilistic Alignment of Relations, Instances, and Schema , 2011, Proc. VLDB Endow..

[42]  Simone Paolo Ponzetto,et al.  Integrating Open and Closed Information Extraction: Challenges and First Steps , 2013, NLP-DBPEDIA@ISWC.