Building Knowledge Base through Deep Learning Relation Extraction and Wikidata

Many AI agent tasks require domain specific knowledge graph (KG) that is compact and complete. We present a methodology to build domain specific KG by merging output from deep learning-based relation extraction from free text and existing knowledge database such as Wikidata. We first form a static KG by traversing knowledge database constrained by domain keywords. Very large high-quality training data set is then generated automatically by matching Common Crawl data with relation keywords extracted from knowledge database. We describe the training data generation process in detail and subsequent experiments with deep learning approaches to relation extraction. The resulting model is used to generate new triples from free text corpus and create a dynamic KG. The static and dynamic KGs are then merged into a new KB satisfying the requirement of specific knowledge-oriented AI tasks such as question answering, chatting, or intelligent retrieval. The proposed methodology can be easily transferred to other domains or languages.

[1]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[2]  Bowen Zhou,et al.  End-to-end Structure-Aware Convolutional Networks for Knowledge Base Completion , 2018, AAAI.

[3]  Li Zhao,et al.  Reinforcement Learning for Relation Classification From Noisy Data , 2018, AAAI.

[4]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[5]  Chris Develder,et al.  Joint entity recognition and relation extraction as a multi-head selection problem , 2018, Expert Syst. Appl..

[6]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[7]  Jun Zhao,et al.  Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks , 2015, EMNLP.

[8]  Bowen Zhou,et al.  Classifying Relations by Ranking with Convolutional Neural Networks , 2015, ACL.

[9]  Eduard H. Hovy,et al.  An Interpretable Knowledge Transfer Model for Knowledge Base Completion , 2017, ACL.

[10]  Dongyan Zhao,et al.  Learning with Noise: Enhance Distantly Supervised Relation Extraction with Dynamic Transition Matrix , 2017, ACL.

[11]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[12]  Zhen Wang,et al.  Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[13]  Ryutaro Ichise,et al.  T2KG: An End-to-End System for Creating Knowledge Graph from Unstructured Text , 2017, AAAI Workshops.

[14]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[15]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[16]  Dong Wang,et al.  Relation Classification via Recurrent Neural Network , 2015, ArXiv.

[17]  John Miller,et al.  Traversing Knowledge Graphs in Vector Space , 2015, EMNLP.

[18]  Zhiyuan Liu,et al.  Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[19]  Fan Yang,et al.  Differentiable Learning of Logical Rules for Knowledge Base Reasoning , 2017, NIPS.

[20]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[21]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[22]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[23]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[24]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[25]  Zhiyuan Liu,et al.  Neural Relation Extraction with Selective Attention over Instances , 2016, ACL.