暂无分享,去创建一个
[1] Kentaro Inui,et al. Attention Module is Not Only a Weight: Analyzing Transformers with Vector Norms , 2020, ArXiv.
[2] Anna Rumshisky,et al. Revealing the Dark Secrets of BERT , 2019, EMNLP.
[3] Furu Wei,et al. Visualizing and Understanding the Effectiveness of BERT , 2019, EMNLP.
[4] Yonatan Belinkov,et al. Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.
[5] R. Thomas McCoy,et al. BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance , 2020, BLACKBOXNLP.
[6] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[7] Anna Rumshisky,et al. A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.
[8] Robert Frank,et al. Open Sesame: Getting inside BERT’s Linguistic Knowledge , 2019, BlackboxNLP@ACL.
[9] Ali Farhadi,et al. HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.
[10] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[11] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[12] Yang Liu,et al. On Identifiability in Transformers , 2020, ICLR.
[13] Yuval Pinter,et al. Attention is not not Explanation , 2019, EMNLP.
[14] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[15] R. Thomas McCoy,et al. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.
[16] Philip H. S. Torr,et al. SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.
[17] Samy Bengio,et al. Are All Layers Created Equal? , 2019, J. Mach. Learn. Res..
[18] Yuandong Tian,et al. One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers , 2019, NeurIPS.
[19] Hector J. Levesque,et al. The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.
[20] Yuandong Tian,et al. Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP , 2019, ICLR.
[21] Shikha Bordia,et al. Do Attention Heads in BERT Track Syntactic Dependencies? , 2019, ArXiv.
[22] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[23] Peter Henderson,et al. Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning , 2020, ArXiv.
[24] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.
[25] Dan Jurafsky,et al. Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models , 2020, EMNLP.
[26] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.
[27] Hung-Yu Kao,et al. Probing Neural Network Comprehension of Natural Language Arguments , 2019, ACL.
[28] Shiyu Chang,et al. The Lottery Ticket Hypothesis for Pre-trained BERT Networks , 2020, NeurIPS.
[29] Christopher D. Manning,et al. A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.
[30] Jason Yosinski,et al. Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.
[31] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.
[32] M. Tavoni,et al. Country-level social cost of carbon , 2018, Nature Climate Change.
[33] Dan Roth,et al. Cross-Lingual Ability of Multilingual BERT: An Empirical Study , 2019, ICLR.
[34] Yonatan Belinkov,et al. Analyzing the Structure of Attention in a Transformer Language Model , 2019, BlackboxNLP@ACL.
[35] Ali Farhadi,et al. Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping , 2020, ArXiv.
[36] Timo Aila,et al. Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.
[37] Yoav Goldberg,et al. Assessing BERT's Syntactic Abilities , 2019, ArXiv.
[38] Noah A. Smith,et al. Is Attention Interpretable? , 2019, ACL.
[39] Omer Levy,et al. Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.
[40] Samuel R. Bowman,et al. Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.
[41] Peter Szolovits,et al. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment , 2020, AAAI.
[42] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[43] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.
[44] J. Scott McCarley,et al. Pruning a BERT-based Question Answering Model , 2019, ArXiv.
[45] Peter Clark,et al. The Seventh PASCAL Recognizing Textual Entailment Challenge , 2011, TAC.
[46] Eneko Agirre,et al. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.
[47] Anna Rumshisky,et al. Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks , 2020, AAAI.
[48] Chris Brockett,et al. Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.
[49] Ivan Titov,et al. Information-Theoretic Probing with Minimum Description Length , 2020, EMNLP.
[50] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[51] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[52] Byron C. Wallace,et al. Attention is not Explanation , 2019, NAACL.
[53] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[54] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[55] Roy Bar-Haim,et al. The Second PASCAL Recognising Textual Entailment Challenge , 2006 .
[56] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[57] Ido Dagan,et al. The Sixth PASCAL Recognizing Textual Entailment Challenge , 2009, TAC.
[58] Avirup Sil,et al. Structured Pruning of a BERT-based Question Answering Model , 2019 .
[59] Kevin Duh,et al. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning , 2020, RepL4NLP@ACL.
[60] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[61] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[62] Thomas Wolf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.