JAKET: Joint Pre-training of Knowledge Graph and Language Understanding

Knowledge graphs (KGs) contain rich information about world knowledge, entities and relations. Thus, they can be great supplements to existing pre-trained language models. However, it remains a challenge to efficiently integrate information from KG into language modeling. And the understanding of a knowledge graph requires related context. We propose a novel joint pre-training framework, JAKET, to model both the knowledge graph and language. The knowledge module and language module provide essential information to mutually assist each other: the knowledge module produces embeddings for entities in text while the language module generates context-aware initial embeddings for entities and relations in the graph. Our design enables the pre-trained model to easily adapt to unseen knowledge graphs in new domains. Experimental results on several knowledge-aware NLP tasks show that our proposed framework achieves superior performance by effectively leveraging knowledge in language understanding.

[1]  Vikram Nitin,et al.  Composition-based Multi-Relational Graph Convolutional Networks , 2020, ICLR.

[2]  Wenhan Xiong,et al.  Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model , 2019, ICLR.

[3]  Alexander J. Smola,et al.  Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs , 2019, ArXiv.

[4]  Tao Shen,et al.  Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning , 2020, EMNLP.

[5]  Jonathan Berant,et al.  oLMpics-On What Language Model Pre-training Captures , 2019, Transactions of the Association for Computational Linguistics.

[6]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[7]  Le Song,et al.  Variational Reasoning for Question Answering with Knowledge Graph , 2017, AAAI.

[8]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[9]  William W. Cohen,et al.  PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text , 2019, EMNLP.

[10]  Maosong Sun,et al.  FewRel 2.0: Towards More Challenging Few-Shot Relation Classification , 2019, EMNLP.

[11]  Yu Sun,et al.  ERNIE: Enhanced Representation through Knowledge Integration , 2019, ArXiv.

[12]  Nan Duan,et al.  Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering , 2019, AAAI.

[13]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[14]  Zhiyuan Liu,et al.  FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation , 2018, EMNLP.

[15]  Ming-Wei Chang,et al.  REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[16]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[18]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[19]  Jeffrey Ling,et al.  Matching the Blanks: Distributional Similarity for Relation Learning , 2019, ACL.

[20]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[21]  Roy Schwartz,et al.  Knowledge Enhanced Contextual Word Representations , 2019, EMNLP/IJCNLP.

[22]  Tianyu Gao,et al.  KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation , 2019, ArXiv.

[23]  Ruslan Salakhutdinov,et al.  Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text , 2018, EMNLP.

[24]  Lingfan Yu,et al.  Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. , 2019 .

[25]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[26]  Zhe Zhao,et al.  K-BERT: Enabling Language Representation with Knowledge Graph , 2019, AAAI.

[27]  William W. Cohen,et al.  Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge , 2020, ArXiv.

[28]  Chang Zhou,et al.  Cognitive Graph for Multi-Hop Reading Comprehension at Scale , 2019, ACL.

[29]  Eunsol Choi,et al.  Entities as Experts: Sparse Memory Access with Entity Supervision , 2020, EMNLP.

[30]  Maosong Sun,et al.  ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[31]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[32]  Yoav Shoham,et al.  SenseBERT: Driving Some Sense into BERT , 2019, ACL.

[33]  Yonatan Belinkov,et al.  Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[34]  Rahul Gupta,et al.  SLING: A framework for frame semantic parsing , 2017, ArXiv.

[35]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[36]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.