How Does BERT Answer Questions?: A Layer-Wise Analysis of Transformer Representations
暂无分享,去创建一个
Alexander Löser | Felix A. Gers | Benjamin Winter | Betty van Aken | Alexander Löser | Felix Alexander Gers | Benjamin Winter
[1] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[2] Richard Socher,et al. The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.
[3] James H. Martin,et al. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .
[4] Mitchell P. Marcus,et al. OntoNotes : A Large Training Corpus for Enhanced Processing , 2017 .
[5] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[6] Alex Wang,et al. What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.
[7] A ZadehLotfi. Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic , 1997 .
[8] Yonatan Belinkov,et al. What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.
[9] Douwe Kiela,et al. SentEval: An Evaluation Toolkit for Universal Sentence Representations , 2018, LREC.
[10] Ellen M. Voorhees,et al. Overview of TREC 2001 , 2001, TREC.
[11] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.
[12] Yoshua Bengio,et al. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.
[13] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.
[14] Yonatan Belinkov,et al. Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.
[15] Jason Weston,et al. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.
[16] Tasha Nagamine,et al. Exploring how deep neural networks form phonemic categories , 2015, INTERSPEECH.
[17] Filip Karlo Dosilovic,et al. Explainable artificial intelligence: A survey , 2018, 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).
[18] Daniel Jurafsky,et al. Understanding Neural Networks through Representation Erasure , 2016, ArXiv.
[19] Karl Pearson F.R.S.. LIII. On lines and planes of closest fit to systems of points in space , 1901 .
[20] Xing Shi,et al. Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.
[21] Pierre Comon,et al. Independent component analysis, A new concept? , 1994, Signal Process..
[22] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.
[23] Yoav Goldberg,et al. Assessing BERT's Syntactic Abilities , 2019, ArXiv.
[24] Franco Turini,et al. A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..
[25] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[26] Laurens van der Maaten,et al. Learning a Parametric Embedding by Preserving Local Structure , 2009, AISTATS.
[27] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[28] Lotfi A. Zadeh,et al. Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic , 1997, Fuzzy Sets Syst..
[29] Sebastian Ruder,et al. Fine-tuned Language Models for Text Classification , 2018, ArXiv.
[30] Dan Roth,et al. Learning Question Classifiers , 2002, COLING.
[31] Zhiyuan Liu,et al. Understanding the Behaviors of BERT in Ranking , 2019, ArXiv.
[32] Willem H. Zuidema,et al. Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure , 2017, J. Artif. Intell. Res..
[33] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[34] Ali Farhadi,et al. Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.
[35] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[36] Quanshi Zhang,et al. Visual interpretability for deep learning: a survey , 2018, Frontiers of Information Technology & Electronic Engineering.
[37] Byron C. Wallace,et al. Attention is not Explanation , 2019, NAACL.
[38] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[39] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[40] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[41] Zachary Chase Lipton. The mythos of model interpretability , 2016, ACM Queue.