暂无分享,去创建一个
[1] Jonathan Berant,et al. oLMpics-On What Language Model Pre-training Captures , 2019, Transactions of the Association for Computational Linguistics.
[2] Language Modeling Teaches You More than Translation Does : Lessons Learned Through Auxiliary Task Analysis , 2018 .
[3] Ryan Cotterell,et al. Information-Theoretic Probing for Linguistic Structure , 2020, ACL.
[4] Graham Neubig,et al. How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.
[5] Christopher D. Manning,et al. Arc-swift: A Novel Transition System for Dependency Parsing , 2017, ACL.
[6] Lei Yu,et al. Learning and Evaluating General Linguistic Intelligence , 2019, ArXiv.
[7] Luke S. Zettlemoyer,et al. End-to-end Neural Coreference Resolution , 2017, EMNLP.
[8] Emmanuel Dupoux,et al. Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.
[9] Anna Rumshisky,et al. Revealing the Dark Secrets of BERT , 2019, EMNLP.
[10] Jörg Tiedemann,et al. An Analysis of Encoder Representations in Transformer-Based Machine Translation , 2018, BlackboxNLP@EMNLP.
[11] Rico Sennrich,et al. Context-Aware Neural Machine Translation Learns Anaphora Resolution , 2018, ACL.
[12] Yonatan Belinkov,et al. Identifying and Controlling Important Neurons in Neural Machine Translation , 2018, ICLR.
[13] Yann Ollivier,et al. The Description Length of Deep Learning models , 2018, NeurIPS.
[14] Tal Linzen,et al. Targeted Syntactic Evaluation of Language Models , 2018, EMNLP.
[15] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[17] Douwe Kiela,et al. No Training Required: Exploring Random Encoders for Sentence Classification , 2019, ICLR.
[18] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[19] Sebastian Riedel,et al. Language Models as Knowledge Bases? , 2019, EMNLP.
[20] Adam Lopez,et al. Understanding Learning Dynamics Of Language Models with SVCCA , 2018, NAACL.
[21] Christof Monz,et al. The Importance of Being Recurrent for Modeling Hierarchical Structure , 2018, EMNLP.
[22] Maria Leonor Pacheco,et al. of the Association for Computational Linguistics: , 2001 .
[23] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[24] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.
[25] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[26] John Hewitt,et al. Designing and Interpreting Probes with Control Tasks , 2019, EMNLP.
[27] Samuel R. Bowman,et al. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis , 2018, BlackboxNLP@EMNLP.
[28] Fei-Fei Li,et al. Visualizing and Understanding Recurrent Networks , 2015, ArXiv.
[29] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[30] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.
[31] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.
[32] Rico Sennrich,et al. The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives , 2019, EMNLP.
[33] Antti Honkela,et al. Variational learning and bits-back coding: an information-theoretic view to Bayesian learning , 2004, IEEE Transactions on Neural Networks.
[34] Gemma Boleda,et al. Convolutional Neural Network Language Models , 2016, EMNLP.
[35] Jorma Rissanen,et al. Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.
[36] Yonatan Belinkov,et al. Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.
[37] Alex Wang,et al. What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.
[38] Dmitry P. Vetrov,et al. Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.
[39] Max Welling,et al. Bayesian Compression for Deep Learning , 2017, NIPS.
[40] Edouard Grave,et al. Colorless Green Recurrent Networks Dream Hierarchically , 2018, NAACL.
[41] Peter Grünwald,et al. A tutorial introduction to the minimum description length principle , 2004, ArXiv.