论文信息 - KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding - 字舞流文

KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding

Contextualized entity representations learned by state-of-the-art deep learning models (BERT, GPT, T5, etc) leverage the attention mechanism to learn the data context. However, these models are still blind to leverage the knowledge context present in the knowledge graph. Knowledge context can be understood as semantics about entities, and their relationship with neighboring entities in knowledge graphs. We propose a novel and effective technique to infuse knowledge context from knowledge graphs for conceptual and ambiguous entities into models based on transformer architecture. Our novel technique project knowledge graph embedding in the homogeneous vector-space, introduces new token-types for entities, align entity position ids, and a selective attention mechanism. We take BERT as a baseline model and implement ”KnowledgeInfused BERT” by infusing knowledge context from ConceptNet and WordNet, which significantly outperforms BERT over a wide range of NLP tasks over eight different GLUE datasets. KI-BERT-base model even outperforms BERTlarge for domain-specific tasks like SciTail and academic subsets of QQP, QNLI, and MNLI.

Amit Sheth | Keyur Faldu | Prashant Kikani | Hemang Akabari | A. Sheth | Keyur Faldu | Prashant Kikani | Hemang Akabari

[1] Jason Weston,et al. Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[2] Pasquale Minervini,et al. Convolutional 2D Knowledge Graph Embeddings , 2017, AAAI.

[3] Jonathan Berant,et al. Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge , 2020, ArXiv.

[4] Zhe Zhao,et al. K-BERT: Enabling Language Representation with Knowledge Graph , 2019, AAAI.

[5] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[6] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[7] Roberto Navigli,et al. Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison , 2017, EACL.

[8] Amit Sheth,et al. Semantics of the Black-Box: Can knowledge graphs help make deep learning systems more interpretable and explainable? , 2020, ArXiv.

[9] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[10] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[11] Xuanjing Huang,et al. GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge , 2019, EMNLP.

[12] Peter Clark,et al. SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.

[13] Sayan Dasgupta,et al. A framework for predicting, interpreting, and improving Learning Outcomes , 2020, ArXiv.

[14] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[15] Tianyu Gao,et al. KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation , 2019, ArXiv.

[16] Catherine Havasi,et al. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[17] Zhen-Hua Ling,et al. Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models , 2019, ArXiv.

[18] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[19] Partha Pratim Talukdar,et al. Zero-shot Word Sense Disambiguation using Sense Definition Embeddings , 2019, ACL.

[20] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[21] Sameer Singh,et al. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList , 2020, ACL.

[22] Manas Gaur. Explainable AI Using Knowledge Graphs , 2020 .

[23] Maosong Sun,et al. ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[24] Pietro Liò,et al. Graph Attention Networks , 2017, ICLR.

[25] Xuanjing Huang,et al. K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters , 2020, FINDINGS.

[26] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[27] Yoav Shoham,et al. SenseBERT: Driving Some Sense into BERT , 2019, ACL.