Language Model Evaluation Beyond Perplexity
暂无分享,去创建一个
[1] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.
[2] Edouard Grave,et al. Colorless Green Recurrent Networks Dream Hierarchically , 2018, NAACL.
[3] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[4] Constance L. Wood,et al. Large Sample Results for Kolmogorov-Smirnov Statistics for Discrete Distributions , 1978 .
[5] Yonatan Belinkov,et al. Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.
[6] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[7] N. Smirnov. Table for Estimating the Goodness of Fit of Empirical Distributions , 1948 .
[8] André F. T. Martins,et al. Sparse Sequence-to-Sequence Models , 2019, ACL.
[9] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[10] Germinal Cocho,et al. Fitting Ranked Linguistic Data with Two-Parameter Functions , 2010, Entropy.
[11] Florian Mohnert,et al. Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information , 2018, BlackboxNLP@EMNLP.
[12] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[13] Tal Linzen,et al. A Neural Model of Adaptation in Reading , 2018, EMNLP.
[14] David M. W. Powers,et al. Applications and Explanations of Zipf’s Law , 1998, CoNLL.
[15] L. Devroye,et al. No Empirical Probability Measure can Converge in the Total Variation Sense for all Distributions , 1990 .
[16] Hamidreza Mostafaei,et al. Probability Metrics and their Applications , 2011 .
[17] S. Rachev,et al. The Methods of Distances in the Theory of Probability and Statistics , 2013 .
[18] Tatsunori B. Hashimoto,et al. Improved Natural Language Generation via Loss Truncation , 2020, ACL.
[19] Graham Neubig,et al. How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.
[20] Sunita Sarawagi,et al. Length bias in Encoder Decoder Models and a Case for Global Conditioning , 2016, EMNLP.
[21] John Hewitt,et al. Designing and Interpreting Probes with Control Tasks , 2019, EMNLP.
[22] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[23] S. Horn,et al. Goodness-of-fit tests for discrete data: a review and an application to a health impairment scale. , 1977, Biometrics.
[24] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[25] Gemma Boleda,et al. Probing for Referential Information in Language Models , 2020, ACL.
[26] Kumiko Tanaka-Ishii,et al. Do neural nets learn statistical laws behind natural language? , 2017, PloS one.
[27] Emmanuel Dupoux,et al. Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.
[28] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[29] J. Christopher Beck,et al. Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models , 2019, ICML.
[30] Mark E. J. Newman,et al. Power-Law Distributions in Empirical Data , 2007, SIAM Rev..
[31] S. A. Chowdhury,et al. RNN Simulations of Grammaticality Judgments on Long-distance Dependencies , 2018, COLING.
[32] Wilker Aziz,et al. Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation , 2020, COLING.
[33] Sheldon M. Ross,et al. Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.
[34] Rotem Dror,et al. The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing , 2018, ACL.
[35] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.
[36] G. Herdan,et al. Type-token mathematics : a textbook of mathematical linguistics , 1960 .
[37] P. Good,et al. Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .
[38] George Kingsley Zipf,et al. Human behavior and the principle of least effort , 1949 .
[39] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[40] S. Piantadosi. Zipf’s word frequency law in natural language: A critical review and future directions , 2014, Psychonomic Bulletin & Review.
[41] Omer Levy,et al. Deep RNNs Encode Soft Hierarchical Syntax , 2018, ACL.
[42] Christian Schluter. On Zipf’s law and the bias of Zipf regressions , 2020, Empirical Economics.
[43] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.
[44] Kumiko Tanaka-Ishii,et al. Evaluating Computational Language Models with Scaling Properties of Natural Language , 2019, Computational Linguistics.
[45] Ewan Klein,et al. Natural Language Processing with Python , 2009 .
[46] David Chiang,et al. Correcting Length Bias in Neural Machine Translation , 2018, WMT.
[47] Francesc Font-Clos,et al. Large-Scale Analysis of Zipf’s Law in English Texts , 2015, PloS one.