Revealing the Dark Secrets of BERT
暂无分享,去创建一个
Anna Rumshisky | Alexey Romanov | Anna Rogers | Olga Kovaleva | Anna Rumshisky | Anna Rogers | Olga Kovaleva | Alexey Romanov
[1] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[2] Carson T. Schütze. The empirical base of linguistics: Grammaticality judgments and linguistic methodology , 1998 .
[3] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[4] Chris Brockett,et al. Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.
[5] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[6] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[7] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[8] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[9] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.
[10] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[11] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[12] Yoav Goldberg,et al. Assessing BERT's Syntactic Abilities , 2019, ArXiv.
[13] Eneko Agirre,et al. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.
[14] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[15] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[16] Jimmy J. Lin,et al. Rethinking Complex Neural Network Architectures for Document Classification , 2019, NAACL.
[17] Rico Sennrich,et al. Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures , 2018, EMNLP.
[18] John B. Lowe,et al. The Berkeley FrameNet Project , 1998, ACL.
[19] Benoît Sagot,et al. What Does BERT Learn about the Structure of Language? , 2019, ACL.
[20] Christof Monz,et al. The Importance of Being Recurrent for Modeling Hierarchical Structure , 2018, EMNLP.
[21] Yann Dauphin,et al. Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.
[22] Alexey Romanov,et al. Lessons from Natural Language Inference in the Clinical Domain , 2018, EMNLP.
[23] Yonatan Belinkov,et al. Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.