A Primer in BERTology: What We Know About How BERT Works
暂无分享,去创建一个
[1] M. Jackson. What do you mean? , 1989, Geriatric nursing.
[2] John B. Lowe,et al. The Berkeley FrameNet Project , 1998, ACL.
[3] Danqi Chen,et al. of the Association for Computational Linguistics: , 2001 .
[4] A. Goldberg. Constructions at Work: The Nature of Generalization in Language , 2006 .
[5] Slav Petrov,et al. Products of Random Latent Variable Grammars , 2010, NAACL.
[6] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[7] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[8] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[9] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[10] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[11] Matt Crane,et al. Questionable Answers in Question Answering Research: Reproducibility and Variability of Published Results , 2018, TACL.
[12] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[13] Jörg Tiedemann,et al. An Analysis of Encoder Representations in Transformer-Based Machine Translation , 2018, BlackboxNLP@EMNLP.
[14] Pramod Viswanath,et al. All-but-the-Top: Simple and Effective Postprocessing for Word Representations , 2017, ICLR.
[15] Samuel R. Bowman,et al. Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks , 2018, ArXiv.
[16] Yu Sun,et al. ERNIE: Enhanced Representation through Knowledge Integration , 2019, ArXiv.
[17] Sameer Singh,et al. Do NLP Models Know Numbers? Probing Numeracy in Embeddings , 2019, EMNLP.
[18] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[19] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[20] Rico Sennrich,et al. The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives , 2019, EMNLP.
[21] Yue Zhang,et al. Using Dynamic Embeddings to Improve Static Embeddings , 2019, ArXiv.
[22] Noah A. Smith,et al. Is Attention Interpretable? , 2019, ACL.
[23] Robert Frank,et al. Open Sesame: Getting inside BERT’s Linguistic Knowledge , 2019, BlackboxNLP@ACL.
[24] Xing Wu,et al. Conditional BERT Contextual Augmentation , 2018, ICCS.
[25] Sameer Singh,et al. Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.
[26] Lei Yu,et al. Learning and Evaluating General Linguistic Intelligence , 2019, ArXiv.
[27] Byron C. Wallace,et al. Attention is not Explanation , 2019, NAACL.
[28] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[29] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[30] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[31] Naveen Arivazhagan,et al. Small and Practical BERT Models for Sequence Labeling , 2019, EMNLP.
[32] Mark Dredze,et al. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.
[33] Sourav Dutta,et al. Whatcha lookin' at? DeepLIFTing BERT's Attention in Question Answering , 2019, ArXiv.
[34] M. de Rijke,et al. Understanding Multi-Head Attention in Abstractive Summarization , 2019, ArXiv.
[35] Daniel Kondratyuk,et al. 75 Languages, 1 Model: Parsing Universal Dependencies Universally , 2019, EMNLP.
[36] Benoît Sagot,et al. What Does BERT Learn about the Structure of Language? , 2019, ACL.
[37] Tal Linzen,et al. Quantity doesn’t buy quality syntax with neural language models , 2019, EMNLP.
[38] Samuel Broscheit,et al. Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking , 2019, CoNLL.
[39] James Demmel,et al. Reducing BERT Pre-Training Time from 3 Days to 76 Minutes , 2019, ArXiv.
[40] Shikha Bordia,et al. Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs , 2019, EMNLP.
[41] Deepening Hidden Representations from Pre-trained Language Models for Natural Language Understanding , 2019, ArXiv.
[42] R. Thomas McCoy,et al. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.
[43] Christopher D. Manning,et al. A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.
[44] Kees van Deemter,et al. What do you mean, BERT? Assessing BERT as a Distributional Semantics Model , 2019, ArXiv.
[45] J. Scott McCarley,et al. Structured Pruning of a BERT-based Question Answering Model , 2019 .
[46] Yanzhi Wang,et al. Reweighted Proximal Pruning for Large-Scale Language Representation , 2019, ArXiv.
[47] Shuohang Wang,et al. What does BERT Learn from Multiple-Choice Reading Comprehension Datasets? , 2019, ArXiv.
[48] Di He,et al. Efficient Training of BERT by Progressively Stacking , 2019, ICML.
[49] Alexander M. Fraser,et al. How Language-Neutral is Multilingual BERT? , 2019, ArXiv.
[50] Chandler May,et al. On Measuring Social Biases in Sentence Encoders , 2019, NAACL.
[51] Maosong Sun,et al. ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.
[52] Terry Regier,et al. Does BERT agree? Evaluating knowledge of structure dependence through agreement relations , 2019, ArXiv.
[53] Ming Zhou,et al. Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks , 2019, EMNLP.
[54] Iain Murray,et al. BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning , 2019, ICML.
[55] Yonatan Belinkov,et al. Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.
[56] Rudolf Rosa,et al. Inducing Syntactic Trees from BERT Representations , 2019, ArXiv.
[57] A. Kreuzer,et al. WaLDORf: Wasteless Language-model Distillation On Reading-comprehension , 2019, arXiv.org.
[58] Ming-Wei Chang,et al. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models , 2019 .
[59] Yang Song,et al. Extreme Language Model Compression with Optimal Subwords and Shared Projections , 2019, ArXiv.
[60] BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA , 2019, ArXiv.
[61] Eva Schlinger,et al. How Multilingual is Multilingual BERT? , 2019, ACL.
[62] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[63] Jungo Kasai,et al. Understanding Commonsense Inference Aptitude of Deep Contextual Representations , 2019, Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing.
[64] Tapio Salakoski,et al. Is Multilingual BERT Fluent in Language Generation? , 2019, ArXiv.
[65] Alexander Löser,et al. How Does BERT Answer Questions?: A Layer-Wise Analysis of Transformer Representations , 2019, CIKM.
[66] Ramesh Nallapati,et al. Universal Text Representation from BERT: An Empirical Study , 2019, ArXiv.
[67] Noah A. Smith,et al. To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks , 2019, RepL4NLP@ACL.
[68] Ali Farhadi,et al. HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.
[69] Yuki Arase,et al. Transfer Fine-Tuning: A BERT Case Study , 2019, EMNLP/IJCNLP.
[70] Ming-Wei Chang,et al. Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation , 2019, ArXiv.
[71] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.
[72] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.
[73] Kawin Ethayarajh,et al. How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings , 2019, EMNLP.
[74] Richard Socher,et al. BERT is Not an Interlingua and the Bias of Tokenization , 2019, EMNLP.
[75] Luke S. Zettlemoyer,et al. Cloze-driven Pretraining of Self-attention Networks , 2019, EMNLP.
[76] Yonatan Belinkov,et al. Analyzing the Structure of Attention in a Transformer Language Model , 2019, BlackboxNLP@ACL.
[77] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[78] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[79] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[80] Shikha Bordia,et al. Do Attention Heads in BERT Track Syntactic Dependencies? , 2019, ArXiv.
[81] Gregor Wiedemann,et al. Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings , 2019, KONVENS.
[82] Improving BERT Fine-tuning with Embedding Normalization , 2019, ArXiv.
[83] Jesse Vig,et al. Visualizing Attention in Transformer-Based Language Representation Models , 2019, ArXiv.
[84] Guoyin Wang,et al. Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding , 2019, ArXiv.
[85] Noah A. Smith,et al. Knowledge Enhanced Contextual Word Representations , 2019, EMNLP.
[86] Vassilina Nikoulina,et al. On the use of BERT for Neural Machine Translation , 2019, EMNLP.
[87] Yejin Choi,et al. Do Neural Language Representations Learn Physical Commonsense? , 2019, CogSci.
[88] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[89] Yuval Pinter,et al. Attention is not not Explanation , 2019, EMNLP.
[90] Sebastian Riedel,et al. Language Models as Knowledge Bases? , 2019, EMNLP.
[91] T. Goldstein,et al. FreeLB: Enhanced Adversarial Training for Language Understanding , 2019, ICLR 2020.
[92] Furu Wei,et al. Visualizing and Understanding the Effectiveness of BERT , 2019, EMNLP.
[93] Ewan Dunbar,et al. RNNs Implicitly Implement Tensor Product Representations , 2018, ICLR.
[94] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.
[95] Anna Rumshisky,et al. Revealing the Dark Secrets of BERT , 2019, EMNLP.
[96] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[97] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[98] Alex Wang,et al. What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.
[99] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[100] Roland Vollgraf,et al. Pooled Contextualized Embeddings for Named Entity Recognition , 2019, NAACL.
[101] Rui Cao,et al. Document Classification by Word Embeddings of BERT , 2019, PACLING.
[102] Yann Dauphin,et al. Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.
[103] Yoav Goldberg,et al. Assessing BERT's Syntactic Abilities , 2019, ArXiv.
[104] Hung-Yu Kao,et al. Probing Neural Network Comprehension of Natural Language Arguments , 2019, ACL.
[105] André F. T. Martins,et al. Adaptively Sparse Transformers , 2019, EMNLP.
[106] Yonatan Belinkov,et al. Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.
[107] Alexander M. Rush,et al. Commonsense Knowledge Mining from Pretrained Models , 2019, EMNLP.
[108] Leyang Cui,et al. Evaluating Commonsense in Pre-trained Language Models , 2019, AAAI.
[109] Omer Levy,et al. Emergent linguistic structure in artificial neural networks trained by self-supervision , 2020, Proceedings of the National Academy of Sciences.
[110] Xin Jiang,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2019, FINDINGS.
[111] Tie-Yan Liu,et al. MPNet: Masked and Permuted Pre-training for Language Understanding , 2020, NeurIPS.
[112] Samuel R. Bowman,et al. Can neural networks acquire a structural bias from raw linguistic data? , 2020, CogSci.
[113] Sunita Sarawagi,et al. What’s in a Name? Are BERT Named Entity Representations just as Good for any other Name? , 2020, REPL4NLP.
[114] Andreas Moshovos,et al. GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[115] Luchen Tan,et al. SegaBERT: Pre-training of Segment-aware BERT for Language Understanding , 2020, ArXiv.
[116] Kyunghyun Cho,et al. Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models , 2019, ICLR.
[117] Dan Klein,et al. Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers , 2020, ArXiv.
[118] Kentaro Inui,et al. Attention Module is Not Only a Weight: Analyzing Transformers with Vector Norms , 2020, ArXiv.
[119] Furu Wei,et al. BERT-of-Theseus: Compressing BERT by Progressive Module Replacing , 2020, EMNLP.
[120] Sebastian Gehrmann,et al. exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models , 2019, ArXiv.
[121] Eunah Cho,et al. Data Augmentation using Pre-trained Transformer Models , 2020, LIFELONGNLP.
[122] Sameer Singh,et al. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList , 2020, ACL.
[123] Yang Zhang,et al. The Lottery Ticket Hypothesis for Pre-trained BERT Networks , 2020, NeurIPS.
[124] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[125] Qun Liu,et al. Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT , 2020, ACL.
[126] Dan Roth,et al. Cross-Lingual Ability of Multilingual BERT: An Empirical Study , 2019, ICLR.
[127] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[128] Noah A. Smith,et al. Improving Transformer Models by Reordering their Sublayers , 2019, ACL.
[129] Anamitra R. Choudhury,et al. PoWER-BERT: Accelerating BERT inference for Classification Tasks , 2020, ArXiv.
[130] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.
[131] David Vilares,et al. Parsing as Pretraining , 2020, AAAI.
[132] Ta-Chun Su,et al. SesameBERT: Attention for Anywhere , 2020, 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA).
[133] Zhao Hai,et al. Semantics-aware BERT for Language Understanding , 2019, AAAI.
[134] J. Tiedemann,et al. Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation , 2020, FINDINGS.
[135] Ashish Sabharwal,et al. What Does My QA Model Know? Devising Controlled Probes Using Expert Knowledge , 2019, Transactions of the Association for Computational Linguistics.
[136] Jinho D. Choi,et al. Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering , 2020, ACL.
[137] Yoav Goldberg,et al. oLMpics-On What Language Model Pre-training Captures , 2019, Transactions of the Association for Computational Linguistics.
[138] Li Dong,et al. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers , 2020, NeurIPS.
[139] Wei Chu,et al. Symmetric Regularization based BERT for Pair-wise Semantic Reasoning , 2019, SIGIR.
[140] Alexander M. Rush,et al. Movement Pruning: Adaptive Sparsity by Fine-Tuning , 2020, NeurIPS.
[141] Ashish Sabharwal,et al. Probing Natural Language Inference Models through Semantic Fragments , 2019, AAAI.
[142] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[143] G P Shrivatsa Bhargav,et al. Span Selection Pre-training for Question Answering , 2019, ACL.
[144] Myle Ott,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[145] Ryan Cotterell,et al. Information-Theoretic Probing for Linguistic Structure , 2020, ACL.
[146] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[147] Jihun Choi,et al. Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction , 2020, ICLR.
[148] Colin Raffel,et al. How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.
[149] Hao Tian,et al. ERNIE 2.0: A Continual Pre-training Framework for Language Understanding , 2019, AAAI.
[150] Ke M. Tran,et al. From English To Foreign Languages: Transferring Pre-trained Language Models , 2019, ArXiv.
[151] Frank F. Xu,et al. How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.
[152] Omer Levy,et al. SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.
[153] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.
[154] Yiyun Zhao,et al. How does BERT’s attention change when you fine-tune? An analysis methodology and a case study in negation scope , 2020, ACL.
[155] C. Heumann,et al. On the Comparability of Pre-trained Language Models , 2020, SwissText/KONVENS.
[156] Kentaro Inui,et al. Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets , 2019, AAAI.
[157] Anna Rumshisky,et al. When BERT Plays the Lottery, All Tickets Are Winning , 2020, EMNLP.
[158] Gustavo Aguilar,et al. Knowledge Distillation from Internal Representations , 2019, AAAI Conference on Artificial Intelligence.
[159] Anamitra R. Choudhury,et al. PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination , 2020, ICML.
[160] Felice Dell'Orletta,et al. Contextual and Non-Contextual Word Embeddings: an in-depth Linguistic Investigation , 2020, REPL4NLP.
[161] Kevin Gimpel,et al. A Cross-Task Analysis of Text Span Representations , 2020, REPL4NLP.
[162] On Identifiability in Transformers , 2019, ICLR.
[163] Samuel R. Bowman,et al. Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work? , 2020, ACL.
[164] Mikel Artetxe,et al. On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.
[165] Hinrich Schütze,et al. BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance , 2019, ACL.
[166] Allyson Ettinger,et al. What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models , 2019, TACL.
[167] Iryna Gurevych,et al. A Matter of Framing: The Impact of Linguistic Formalism on Probing Results , 2020, EMNLP.
[168] Roi Reichart,et al. PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models , 2020, Transactions of the Association for Computational Linguistics.
[169] Graham Neubig,et al. TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data , 2020, ACL.
[170] Anna Rumshisky,et al. Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks , 2020, AAAI.
[171] Jianfeng Gao,et al. SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization , 2019, ACL.
[172] Ankur P. Parikh,et al. Thieves on Sesame Street! Model Extraction of BERT-based APIs , 2019, ICLR.
[173] Joey Tianyi Zhou,et al. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment , 2019, AAAI.
[174] Lei Yu,et al. A Mutual Information Maximization Perspective of Language Representation Learning , 2019, ICLR.
[175] Ali Farhadi,et al. Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping , 2020, ArXiv.
[176] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[177] Alessandro Moschitti,et al. TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection , 2019, AAAI.
[178] Yoav Goldberg,et al. When Bert Forgets How To POS: Amnesic Probing of Linguistic Properties and MLM Predictions , 2020, ArXiv.
[179] T. Goldstein,et al. FreeLB: Enhanced Adversarial Training for Natural Language Understanding , 2019, ICLR.
[180] Claire Cardie,et al. Interpreting Pretrained Contextualized Representations via Reductions to Static Embeddings , 2020, ACL.
[181] Luo Si,et al. StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding , 2019, ICLR.
[182] Dan Klein,et al. Multilingual Alignment of Contextual Word Representations , 2020, ICLR.
[183] Yu Wu,et al. Does BERT Solve Commonsense Task via Commonsense Knowledge? , 2020, ArXiv.
[184] Mitchell A. Gordon,et al. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning , 2020, REPL4NLP.
[185] Florian Schmidt,et al. BERT as a Teacher: Contextual Embeddings for Sequence-Level Reward , 2020, ArXiv.
[186] Wei-Tsung Kao,et al. Further Boosting BERT-based Models by Duplicating Existing Layers: Some Intriguing Phenomena inside BERT , 2020, ArXiv.
[187] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[188] Ivan Titov,et al. Information-Theoretic Probing with Minimum Description Length , 2020, EMNLP.
[189] Jianfeng Gao,et al. UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training , 2020, ICML.
[190] Steven Schockaert,et al. Inducing Relational Knowledge from BERT , 2019, AAAI.
[191] Oren Etzioni,et al. Green AI , 2019, Commun. ACM.
[192] Wanxiang Che,et al. Pre-Training with Whole Word Masking for Chinese BERT , 2019, ArXiv.
[193] Xuanjing Huang,et al. K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters , 2020, FINDINGS.
[194] Zhiyuan Liu,et al. KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation , 2019, Transactions of the Association for Computational Linguistics.
[195] Goran Glavas,et al. Is Supervised Syntactic Parsing Beneficial for Language Understanding Tasks? An Empirical Investigation , 2020, EACL.
[196] Yin Yang,et al. Compressing Large-Scale Transformer-Based Models: A Case Study on BERT , 2020, Transactions of the Association for Computational Linguistics.