论文信息 - Knowledge Enhanced Pretrained Language Models: A Compreshensive Survey - 字舞流文

Knowledge Enhanced Pretrained Language Models: A Compreshensive Survey

Pretrained Language Models (PLM) have established a new paradigm through learning informative contextualized representations on large-scale text corpus. This new paradigm has revolutionized the entire field of natural language processing, and set the new state-of-the-art performance for a wide variety of NLP tasks. However, though PLMs could store certain knowledge/facts from training corpus, their knowledge awareness is still far from satisfactory. To address this issue, integrating knowledge into PLMs have recently become a very active research area and a variety of approaches have been developed. In this paper, we provide a comprehensive survey of the literature on this emerging and fast-growing field Knowledge Enhanced Pretrained Language Models (KE-PLMs). We introduce three taxonomies to categorize existing work. Besides, we also survey the various NLU and NLG applications on which KE-PLM has demonstrated superior performance over vanilla PLMs. Finally, we discuss challenges that face KE-PLMs and also promising directions for future research.

Parminder Bhatia | Dejiao Zhang | Xiaokai Wei | Andrew Arnold | Andrew O. Arnold | Shen Wang | Xiaokai Wei | Dejiao Zhang | Parminder Bhatia | Shen Wang

[1] Yoav Shoham,et al. SenseBERT: Driving Some Sense into BERT , 2019, ACL.

[2] Danqi Chen,et al. A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[3] Le Song,et al. Know-Evolve: Deep Temporal Reasoning for Dynamic Knowledge Graphs , 2017, ICML.

[4] Jiawei Han,et al. Automated Phrase Mining from Massive Text Corpora , 2017, IEEE Transactions on Knowledge and Data Engineering.

[5] Sameer Singh,et al. Barack’s Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling , 2019, ACL.

[6] Fabio Petroni,et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.

[7] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[8] Xiang Ren,et al. KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning , 2019, EMNLP.

[9] An Yang,et al. Enhancing Pre-Trained Language Representations with Rich Knowledge for Machine Reading Comprehension , 2019, ACL.

[10] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.

[11] Wenhan Xiong,et al. Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model , 2019, ICLR.

[12] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[13] Xipeng Qiu,et al. Pre-trained models for natural language processing: A survey , 2020, Science China Technological Sciences.

[14] Richard Socher,et al. Evaluating the Factual Consistency of Abstractive Text Summarization , 2019, EMNLP.

[15] Donghan Yu,et al. JAKET: Joint Pre-training of Knowledge Graph and Language Understanding , 2020, AAAI.

[16] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[17] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[18] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[19] Tao Shen,et al. Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning , 2020, EMNLP.

[20] Yizhou Sun,et al. GPT-GNN: Generative Pre-Training of Graph Neural Networks , 2020, KDD.

[21] Minlie Huang,et al. SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge , 2020, EMNLP.

[22] Danai Koutra,et al. Relational World Knowledge Representation in Contextual Language Models: A Review , 2021, EMNLP.

[23] Xinyan Xiao,et al. SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis , 2020, ACL.

[24] Yejin Choi,et al. ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning , 2019, AAAI.

[25] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[27] Zhe Zhao,et al. K-BERT: Enabling Language Representation with Knowledge Graph , 2019, AAAI.

[28] Minlie Huang,et al. A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation , 2020, TACL.

[29] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[30] Chengsheng Mao,et al. KG-BERT: BERT for Knowledge Graph Completion , 2019, ArXiv.

[31] Jonathan Berant,et al. oLMpics-On What Language Model Pre-training Captures , 2019, Transactions of the Association for Computational Linguistics.

[32] Xin Jiang,et al. KgPLM: Knowledge-guided Language Model Pre-training via Generative and Discriminative Learning , 2020, ArXiv.

[33] Bohn Stafleu van Loghum,et al. Online … , 2002, LOG IN.

[34] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[35] Zhuosheng Zhang,et al. LIMIT-BERT : Linguistic Informed Multi-Task BERT , 2020, EMNLP.

[36] Benjamin Recht,et al. The Effect of Natural Distribution Shift on Question Answering Models , 2020, ICML.

[37] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[38] Peng Jiang,et al. BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer , 2019, CIKM.

[39] Eric Nyberg,et al. Lexically-constrained Text Generation through Commonsense Knowledge Extraction and Injection , 2020, ArXiv.

[40] Jonathan Berant,et al. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[41] Roy Schwartz,et al. Knowledge Enhanced Contextual Word Representations , 2019, EMNLP/IJCNLP.

[42] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[43] Furu Wei,et al. Language Generation with Multi-hop Reasoning on Commonsense Knowledge Graph , 2020, EMNLP.

[44] Andrea Esuli,et al. SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[45] Yejin Choi,et al. Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning , 2019, EMNLP.

[46] Danqi Chen,et al. Position-aware Attention and Supervised Data Improve Slot Filling , 2017, EMNLP.

[47] Catherine Havasi,et al. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[48] Hao Tian,et al. ERNIE 2.0: A Continual Pre-training Framework for Language Understanding , 2019, AAAI.

[49] Noam Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.

[50] Hinrich Schutze,et al. Negated LAMA: Birds cannot fly , 2019, ArXiv.

[51] Goran Glavas,et al. Probing Pretrained Language Models for Lexical Semantics , 2020, EMNLP.

[52] Olivier Bodenreider,et al. The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[53] Zhen-Hua Ling,et al. Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models , 2019, ArXiv.

[54] Kyunghyun Cho,et al. SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine , 2017, ArXiv.

[55] Yejin Choi,et al. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction , 2019, ACL.

[56] Estevam R. Hruschka,et al. Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[57] Sebastian Riedel,et al. Language Models as Knowledge Bases? , 2019, EMNLP.

[58] Paolo Ferragina,et al. TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[59] Catherine Havasi,et al. Combining pre-trained language models and structured knowledge , 2021, ArXiv.

[60] William W. Cohen,et al. Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge , 2020, ArXiv.

[61] Zhiyuan Liu,et al. FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation , 2018, EMNLP.

[62] Omer Levy,et al. Ultra-Fine Entity Typing , 2018, ACL.

[63] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[64] Zheng Zhang,et al. CoLAKE: Contextualized Language and Knowledge Embedding , 2020, COLING.

[65] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[66] Nathanael Chambers,et al. A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.

[67] Yejin Choi,et al. CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning , 2020, EMNLP.

[68] Nan Duan,et al. Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering , 2019, AAAI.

[69] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[70] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[71] Daniel S. Weld,et al. Design Challenges for Entity Linking , 2015, TACL.

[72] Hung-Yu Kao,et al. Probing Neural Network Comprehension of Natural Language Arguments , 2019, ACL.

[73] Wilfred Ng,et al. CoCoLM: COmplex COmmonsense Enhanced Language Model , 2020, ArXiv.

[74] Yue Zhang,et al. Fast and Accurate Shift-Reduce Constituent Parsing , 2013, ACL.

[75] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[76] Anna Korhonen,et al. Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity , 2019, COLING.

[77] Jason Weston,et al. Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[78] Pasquale Minervini,et al. Convolutional 2D Knowledge Graph Embeddings , 2017, AAAI.

[79] Graham Neubig,et al. Latent Relation Language Models , 2019, AAAI.

[80] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.

[81] Dawn Song,et al. Language Models are Open Knowledge Graphs , 2020, ArXiv.

[82] Hiroyuki Shindo,et al. LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention , 2020, EMNLP.

[83] Colin Raffel,et al. How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.

[84] Paul Groth,et al. Inductive Entity Representations from Text via Link Prediction , 2021, WWW.

[85] Pietro Liò,et al. Graph Attention Networks , 2017, ICLR.

[86] Jure Leskovec,et al. Strategies for Pre-training Graph Neural Networks , 2020, ICLR.

[87] Tao Zhang,et al. Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges , 2018, IEEE Signal Processing Magazine.

[88] Suresh Manandhar,et al. SemEval-2014 Task 4: Aspect Based Sentiment Analysis , 2014, *SEMEVAL.

[89] Maosong Sun,et al. ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[90] Iryna Gurevych,et al. Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers , 2020, DEELIO.

[91] Hua Wu,et al. An End-to-End Model for Question Answering over Knowledge Base with Cross-Attention Combining Global Knowledge , 2017, ACL.

[92] Yunhai Tong,et al. Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees , 2021, EACL.

[93] Bela Gipp,et al. Enriching BERT with Knowledge Graph Embeddings for Document Classification , 2019, KONVENS.

[94] Praveen Paritosh,et al. Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[95] Claire Gardent,et al. Augmenting Transformers with KNN-Based Composite Memory for Dialog , 2020, TACL.

[96] Ulli Waltinger,et al. BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA , 2019, ArXiv.

[97] Fabio Petroni,et al. How Context Affects Language Models' Factual Predictions , 2020, AKBC.

[98] Philip S. Yu,et al. KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning , 2020, AAAI.

[99] Yonatan Belinkov,et al. Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[100] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[101] Zhiyuan Liu,et al. ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning , 2020, ACL.

[102] Xuanjing Huang,et al. K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters , 2020, FINDINGS.

[103] R. Thomas McCoy,et al. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[104] Tianyu Gao,et al. KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation , 2019, ArXiv.

[105] Markus Krötzsch,et al. Wikidata , 2014, Commun. ACM.

[106] Jure Leskovec,et al. Inferring Networks of Substitutable and Complementary Products , 2015, KDD.

[107] Livio Baldini Soares,et al. Entities as Experts: Sparse Memory Access with Entity Supervision , 2020, EMNLP.