The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions
暂无分享,去创建一个
[1] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.
[2] Michael Bloodgood,et al. Analysis of Stopping Active Learning based on Stabilizing Predictions , 2013, CoNLL.
[3] Pasquale Minervini,et al. Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge , 2018, CoNLL.
[4] Iryna Gurevych,et al. Why Comparing Single Performance Scores Does Not Allow to Draw Conclusions About Machine Learning Approaches , 2018, ArXiv.
[5] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[6] Luke Zettlemoyer,et al. Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases , 2019, EMNLP.
[7] Ido Dagan,et al. Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations , 2018, ACL.
[8] Iryna Gurevych,et al. Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging , 2017, EMNLP.
[9] Iryna Gurevych,et al. Improving Generalization by Incorporating Coverage in Natural Language Inference , 2019, ArXiv.
[10] R. Thomas McCoy,et al. BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance , 2020, BLACKBOXNLP.
[11] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[12] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[13] Adina Williams,et al. Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition , 2020, ACL.
[14] Samuel R. Bowman,et al. Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks , 2018, ArXiv.
[15] Timothy J. Hazen,et al. Robust Natural Language Inference Models with Example Forgetting , 2019, ArXiv.
[16] Mohit Bansal,et al. Adversarial NLI: A New Benchmark for Natural Language Understanding , 2020, ACL.
[17] Omer Levy,et al. Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.
[18] R. Thomas McCoy,et al. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.
[19] Ido Dagan,et al. Diversify Your Datasets: Analyzing Generalization via Controlled Variance in Adversarial Datasets , 2019, CoNLL.
[20] Mike Lewis,et al. Generative Question Answering: Learning to Answer the Whole Question , 2018, ICLR.
[21] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[22] Xiang Zhou,et al. What Can We Learn from Collective Human Opinions on Natural Language Inference Data? , 2020, EMNLP.
[23] Ali Farhadi,et al. Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping , 2020, ArXiv.
[24] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[25] Noah A. Smith,et al. Improving Natural Language Inference with a Pretrained Parser , 2019, ArXiv.
[26] Masatoshi Tsuchiya,et al. Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment , 2018, LREC.
[27] Carolyn Penstein Rosé,et al. EQUATE: A Benchmark Evaluation Framework for Quantitative Reasoning in Natural Language Inference , 2019, CoNLL.
[28] Yoav Goldberg,et al. Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets , 2019, EMNLP.
[29] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[30] Christopher Potts,et al. Posing Fair Generalization Tasks for Natural Language Inference , 2019, EMNLP.
[31] Mohit Bansal,et al. Simple Compounded-Label Training for Fact Extraction and Verification , 2020, FEVER.
[32] Luke S. Zettlemoyer,et al. AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.
[33] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[34] Haohan Wang,et al. Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual , 2019, EMNLP.
[35] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Zachary C. Lipton,et al. How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks , 2018, EMNLP.
[37] Niranjan Balasubramanian,et al. The Fine Line between Linguistic Generalization and Failure in Seq2Seq-Attention Models , 2018, ArXiv.
[38] Marco Marelli,et al. A SICK cure for the evaluation of compositional distributional semantic models , 2014, LREC.
[39] Roy Schwartz,et al. Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets , 2019, NAACL.
[40] Matthew C. Makel,et al. Replications in Psychology Research , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.
[41] Allan Jabri,et al. Revisiting Visual Question Answering Baselines , 2016, ECCV.
[42] Carolyn Penstein Rosé,et al. Stress Test Evaluation for Natural Language Inference , 2018, COLING.
[43] Zhen-Hua Ling,et al. Enhanced LSTM for Natural Language Inference , 2016, ACL.
[44] Mohit Bansal,et al. Analyzing Compositionality-Sensitivity of NLI Models , 2018, AAAI.
[45] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[46] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[47] Sebastian Riedel,et al. Behavior Analysis of NLI Models: Uncovering the Influence of Three Factors on Robustness , 2018, NAACL.
[48] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[49] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[50] Yoav Goldberg,et al. Breaking NLI Systems with Sentences that Require Simple Lexical Inferences , 2018, ACL.
[51] Percy Liang,et al. Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.
[52] Yun-Nung Chen,et al. QAInfomax: Learning Robust Question Answering System by Mutual Information Maximization , 2019, EMNLP.
[53] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .