How does BERT’s attention change when you fine-tune? An analysis methodology and a case study in negation scope
暂无分享,去创建一个
[1] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[2] Christopher D. Manning,et al. A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.
[3] Maria Leonor Pacheco,et al. of the Association for Computational Linguistics: , 2001 .
[4] Niranjan Balasubramanian,et al. The Fine Line between Linguistic Generalization and Failure in Seq2Seq-Attention Models , 2018, ArXiv.
[5] Yonatan Belinkov,et al. What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.
[6] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[7] Anna Rumshisky,et al. Revealing the Dark Secrets of BERT , 2019, EMNLP.
[8] Luke S. Zettlemoyer,et al. Dissecting Contextual Word Embeddings: Architecture and Representation , 2018, EMNLP.
[9] Shikha Bordia,et al. Do Attention Heads in BERT Track Syntactic Dependencies? , 2019, ArXiv.
[10] Byron C. Wallace,et al. Attention is not Explanation , 2019, NAACL.
[11] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[12] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[13] Yonatan Belinkov,et al. Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.
[14] Noah A. Smith,et al. To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks , 2019, RepL4NLP@ACL.
[15] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[16] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.
[17] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.
[18] Yuval Pinter,et al. Attention is not not Explanation , 2019, EMNLP.
[19] Maite Taboada,et al. A review corpus annotated for negation, speculation and their scope , 2012, LREC.
[20] R. Thomas McCoy,et al. BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance , 2019, BLACKBOXNLP.
[21] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[22] John Hewitt,et al. Designing and Interpreting Probes with Control Tasks , 2019, EMNLP.
[23] Iryna Gurevych,et al. Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging , 2017, EMNLP.