论文信息 - A Survey of Knowledge Enhanced Pre-trained Models - 字舞流文

A Survey of Knowledge Enhanced Pre-trained Models

Pre-trained models learn contextualized word representations on large-scale text corpus through a self-supervised learning method, which has achieved promising performance after fine-tuning. These models, however, suffer from poor robustness and lack of interpretability. Pre-trained models with knowledge injection, which we call knowledge enhanced pre-trained models (KEPTMs), possess deep understanding and logical reasoning and introduce interpretability to some extent. In this survey, we provide a comprehensive overview of KEPTMs for natural language processing. We first introduce the progress of pre-trained models and knowledge representation learning. Then we systematically categorize existing KEPTMs from three different perspectives. Finally, we outline some potential directions of KEPTMs for future research.

Wei Jiang | Ying Zhang | Jian Yang | Yulong Shen | Xinyu Hu | Gang Xiao | Jinghui Peng | Jian Yang | Jinghui Peng | Ying Zhang | Gang Xiao | Yulong Shen | Wei Jiang | Xinyu Hu

[1] D. Krathwohl. A Revision of Bloom's Taxonomy: An Overview , 2002 .

[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[3] Danqi Chen,et al. Knowledge Guided Text Retrieval and Reading for Open Domain Question Answering , 2019, ArXiv.

[4] Eric Miller,et al. An Introduction to the Resource Description Framework , 1998, D Lib Mag..

[5] Omer Levy,et al. Zero-Shot Relation Extraction via Reading Comprehension , 2017, CoNLL.

[6] Anna Korhonen,et al. Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity , 2019, COLING.

[7] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8] Jason Weston,et al. Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[9] Marvin Minsky,et al. A framework for representing knowledge , 1974 .

[10] Pasquale Minervini,et al. Convolutional 2D Knowledge Graph Embeddings , 2017, AAAI.

[11] Ming-Wei Chang,et al. Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.

[12] Jianfeng Gao,et al. Embedding Entities and Relations for Learning and Inference in Knowledge Bases , 2014, ICLR.

[13] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[14] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[15] Hui Jiang,et al. FreebaseQA: A New Factoid QA Data Set Matching Trivia-Style Question-Answer Pairs with Freebase , 2019, NAACL.

[16] Vikram Nitin,et al. Composition-based Multi-Relational Graph Convolutional Networks , 2020, ICLR.

[17] Bowen Zhou,et al. End-to-end Structure-Aware Convolutional Networks for Knowledge Base Completion , 2018, AAAI.

[18] Bin Liang,et al. CN-DBpedia: A Never-Ending Chinese Knowledge Extraction System , 2017, IEA/AIE.

[19] Huajun Chen,et al. The Semantic Web , 2011, Lecture Notes in Computer Science.

[20] Tianyu Gao,et al. KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation , 2019, ArXiv.

[21] Yann LeCun,et al. Spectral Networks and Deep Locally Connected Networks on Graphs , 2014 .

[22] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[23] Sameer Singh,et al. Barack’s Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling , 2019, ACL.

[24] Ping Li,et al. Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , 2014, NIPS.

[25] Laura Inés Furlong,et al. The EU-ADR corpus: Annotated drugs, diseases, targets, and their relationships , 2012, J. Biomed. Informatics.

[26] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[27] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[28] Christopher Clark,et al. Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.

[29] Xu Sun,et al. A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation , 2018, EMNLP.

[30] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[31] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[32] Matt J. Kusner,et al. A Survey on Contextual Embeddings , 2020, ArXiv.

[33] Jason Weston,et al. Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[34] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.

[35] Yejin Choi,et al. Dynamic Entity Representations in Neural Language Models , 2017, EMNLP.

[36] Yejin Choi,et al. Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning , 2019, EMNLP.

[37] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[38] An Yang,et al. Enhancing Pre-Trained Language Representations with Rich Knowledge for Machine Reading Comprehension , 2019, ACL.

[39] S. Tomkins,et al. Script theory: differential magnification of affects. , 1978, Nebraska Symposium on Motivation. Nebraska Symposium on Motivation.

[40] Ming-Wei Chang,et al. Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base , 2015, ACL.

[41] A. Tordai,et al. Modeling Relational Data with Graph Convolutional Networks , 2017 .

[42] Yann Dauphin,et al. Hierarchical Neural Story Generation , 2018, ACL.

[43] Dongyan Zhao,et al. Plan-And-Write: Towards Better Automatic Storytelling , 2018, AAAI.

[44] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.

[45] Rui Dai,et al. Classifying medical relations in clinical text via convolutional neural networks , 2018, Artif. Intell. Medicine.

[46] Andrew Chou,et al. Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[47] José Camacho-Collados,et al. WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations , 2018, NAACL.

[48] Kyunghyun Cho,et al. SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine , 2017, ArXiv.

[49] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[50] Peter Szolovits,et al. What Is a Knowledge Representation? , 1993, AI Mag..

[51] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52] Ting Liu,et al. Story Ending Prediction by Transferable BERT , 2019, IJCAI.

[53] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[54] Nick Cramer,et al. Automatic Keyword Extraction from Individual Documents , 2010 .

[55] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .

[56] Jaewoo Kang,et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[57] Max Welling,et al. Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[58] Max Welling,et al. Variational Graph Auto-Encoders , 2016, ArXiv.

[59] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[60] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[61] Hans-Peter Kriegel,et al. A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[62] H. Chertkow,et al. Semantic memory , 2002, Current neurology and neuroscience reports.

[63] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[64] Xiaodong Liu,et al. ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension , 2018, ArXiv.

[65] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.

[66] Xipeng Qiu,et al. Pre-trained models for natural language processing: A survey , 2020, Science China Technological Sciences.

[67] Philip S. Yu,et al. BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis , 2019, NAACL.

[68] Ramanathan V. Guha,et al. Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .

[69] Haitian Sun,et al. Adaptable and Interpretable Neural MemoryOver Symbolic Knowledge , 2021, NAACL.

[70] Xuanjing Huang,et al. K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters , 2020, FINDINGS.

[71] Yann Dauphin,et al. Strategies for Structuring Story Generation , 2019, ACL.

[72] Livio Baldini Soares,et al. Entities as Experts: Sparse Memory Access with Entity Supervision , 2020, EMNLP.

[73] Minlie Huang,et al. SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge , 2020, EMNLP.

[74] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[75] Eric J. Miller. An Introduction to the Resource Description Framework. , 1998 .

[76] Zhe Zhao,et al. K-BERT: Enabling Language Representation with Knowledge Graph , 2019, AAAI.

[77] Minlie Huang,et al. A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation , 2020, TACL.

[78] William W. Cohen,et al. Quasar: Datasets for Question Answering by Search and Reading , 2017, ArXiv.

[79] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[80] Maosong Sun,et al. ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[81] Quoc V. Le,et al. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.

[82] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[83] Luyao Huang,et al. Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence , 2019, NAACL.

[84] Jun Zhao,et al. Knowledge Graph Completion with Adaptive Sparse Transfer Matrix , 2016, AAAI.

[85] Nicholas Jing Yuan,et al. Integrating Graph Contextualized Knowledge into Pre-trained Language Models , 2019, FINDINGS.

[86] Yoav Shoham,et al. SenseBERT: Driving Some Sense into BERT , 2019, ACL.

[87] Núria Queralt-Rosinach,et al. Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research , 2014, BMC Bioinformatics.

[88] Zhen Wang,et al. Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[89] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[90] Geoffrey Zweig,et al. Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[91] Tao Meng,et al. SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics , 2020, ACL.

[92] Zhiyuan Liu,et al. Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[93] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[94] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[95] Manohar Kaul,et al. Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs , 2019, ACL.

[96] Jun Zhao,et al. Knowledge Graph Embedding via Dynamic Mapping Matrix , 2015, ACL.

[97] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[98] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[99] Daniel S. Weld,et al. Design Challenges for Entity Linking , 2015, TACL.

[100] Bo Pang,et al. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[101] Timothy M. Hospedales,et al. TuckER: Tensor Factorization for Knowledge Graph Completion , 2019, EMNLP.

[102] Roy Schwartz,et al. Knowledge Enhanced Contextual Word Representations , 2019, EMNLP/IJCNLP.

[103] Haris Papageorgiou,et al. SemEval-2016 Task 5: Aspect Based Sentiment Analysis , 2016, *SEMEVAL.

[104] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[105] Xiaodong Liu,et al. Stochastic Answer Networks for Machine Reading Comprehension , 2017, ACL.

[106] Zheng Zhang,et al. CoLAKE: Contextualized Language and Knowledge Embedding , 2020, COLING.

[107] Jianfeng Gao,et al. UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training , 2020, ICML.

[108] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[109] Danqi Chen,et al. A Discrete Hard EM Approach for Weakly Supervised Question Answering , 2019, EMNLP.

[110] Jason Weston,et al. A semantic matching energy function for learning with multi-relational data , 2013, Machine Learning.

[111] Kyle Lo,et al. SciBERT: Pretrained Contextualized Embeddings for Scientific Text , 2019, ArXiv.

[112] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[113] Philip S. Yu,et al. A Survey on Knowledge Graphs: Representation, Acquisition, and Applications , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[114] Jonathan Berant,et al. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[115] Lorenzo Rosasco,et al. Holographic Embeddings of Knowledge Graphs , 2015, AAAI.

[116] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[117] William W. Cohen,et al. Guessing What's Plausible But Remembering What's True: Accurate Neural Reasoning for Question-Answering , 2020, arXiv.org.

[118] Wenhan Xiong,et al. Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model , 2019, ICLR.

[119] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[120] Yejin Choi,et al. Social IQA: Commonsense Reasoning about Social Interactions , 2019, EMNLP 2019.

[121] Suresh Manandhar,et al. SemEval-2014 Task 4: Aspect Based Sentiment Analysis , 2014, *SEMEVAL.

[122] Qiang Dong,et al. Hownet and the Computation of Meaning: (With CD-ROM) , 2006 .

[123] Jure Leskovec,et al. Inductive Representation Learning on Large Graphs , 2017, NIPS.

[124] Pietro Liò,et al. Graph Attention Networks , 2017, ICLR.

[125] Tao Shen,et al. Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning , 2020, EMNLP.

[126] Balu Bhasuran,et al. Automatic extraction of gene-disease associations from literature using joint ensemble learning , 2018, PloS one.

[127] Estevam R. Hruschka,et al. Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[128] Danqi Chen,et al. Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.