Scalable Knowledge Graph Construction over Text using Deep Learning based Predicate Mapping

Automatic extraction of information from text and its transformation into a structured format is an important goal in both Semantic Web Research and computational linguistics. Knowledge Graphs (KG) serve as an intuitive way to provide structure to unstructured text. A fact in a KG is expressed in the form of a triple which captures entities and their interrelationships (predicates). Multiple triples extracted from text can be semantically identical but they may have a vocabulary gap which could lead to an explosion in the number of redundant triples. Hence, to get rid of this vocabulary gap, there is a need to map triples to a homogeneous namespace. In this work, we present an end-to-end KG construction system, which identifies and extracts entities and relationships from text and maps them to the homogenous DBpedia namespace. For Predicate Mapping, we propose a Deep Learning architecture to model semantic similarity. This mapping step is computation heavy, owing to the large number of triples in DBpedia. We identify and prune unnecessary comparisons to make this step scalable. Our experiments show that the proposed approach is able to construct a richer KG at a significantly lower computation cost with respect to previous work.

[1]  Iryna Gurevych,et al.  A Monolingual Tree-based Translation Model for Sentence Simplification , 2010, COLING.

[2]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[3]  Isabelle Augenstein,et al.  LODifier: Generating Linked Data from Unstructured Text , 2012, ESWC.

[4]  Zhiyuan Liu,et al.  Entity-Duet Neural Ranking: Understanding the Role of Knowledge Graph Semantics in Neural Information Retrieval , 2018, ACL.

[5]  Maarten Versteegh,et al.  Learning Text Similarity with Siamese Recurrent Networks , 2016, Rep4NLP@ACL.

[6]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[7]  Rohan Padhye,et al.  API as a social glue , 2014, ICSE Companion.

[8]  Shashi Narayan,et al.  Hybrid Simplification using Deep Semantics and Machine Translation , 2014, ACL.

[9]  Luciano Del Corro,et al.  ClausIE: clause-based open information extraction , 2013, WWW.

[10]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[11]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[12]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[13]  Aditya Kalyanpur,et al.  PRISMATIC: Inducing Knowledge from a Large Scale Lexicalized Relation Resource , 2010, HLT-NAACL 2010.

[14]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[15]  Hui Cheng,et al.  Deep Reasoning with Knowledge Graph for Social Relationship Understanding , 2018, IJCAI.

[16]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[17]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[18]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[19]  Roberto Navigli,et al.  SemEval-2014 Task 3: Cross-Level Semantic Similarity , 2014, *SEMEVAL.

[20]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[22]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[23]  Christopher Ré,et al.  DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference , 2012, VLDS.

[24]  Sheng Zhang,et al.  Universal Decompositional Semantics on Universal Dependencies , 2016, EMNLP.

[25]  Ryutaro Ichise,et al.  T2KG: An End-to-End System for Creating Knowledge Graph from Unstructured Text , 2017, AAAI Workshops.

[26]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[27]  Raphaël Troncy,et al.  NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Extraction Tools , 2012, EACL.

[28]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[29]  Heeyoung Lee,et al.  A Multi-Pass Sieve for Coreference Resolution , 2010, EMNLP.

[30]  Luciano Serafini,et al.  The KnowledgeStore: an Entity-Based Storage System , 2012, LREC.

[31]  Martin Necaský,et al.  Data Extraction Using NLP Techniques and Its Transformation to Linked Data , 2014, MICAI.

[32]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[33]  Yeqing Li Research and Analysis of Semantic Search Technology Based on Knowledge Graph , 2017, 22017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC).

[34]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[35]  Sherif El-etriby,et al.  Semantic Data Extraction from Infobox Wikipedia Template , 2012 .

[36]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[37]  Pierre Nugues,et al.  Entity Extraction: From Unstructured Text to DBpedia RDF triples , 2012, WoLE@ISWC.

[38]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .