DQI: A Guide to Benchmark Evaluation
暂无分享,去创建一个
Chitta Baral | Swaroop Mishra | Anjana Arunkumar | Bhavdeep Sachdeva | Chris Bryan | Chitta Baral | Chris Bryan | Swaroop Mishra | Bhavdeep Singh Sachdeva | Anjana Arunkumar
[1] Rachel Rudinger,et al. Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.
[2] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[3] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.
[4] G. Ozolins,et al. WHO guidelines for drinking-water quality. , 1984, WHO chronicle.
[5] Mohit Bansal,et al. Adversarial NLI: A New Benchmark for Natural Language Understanding , 2020, ACL.
[6] Tania Martellini,et al. Indoor Air Quality and Health , 2017, International journal of environmental research and public health.
[7] Yejin Choi,et al. WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale , 2020, AAAI.
[8] Nathanael Chambers,et al. A Corpus and Evaluation Framework for Deeper Understanding of Commonsense Stories , 2016, ArXiv.
[9] Yejin Choi,et al. Adversarial Filters of Dataset Biases , 2020, ICML.
[10] Chitta Baral,et al. DQI: Measuring Data Quality in NLP , 2020, ArXiv.
[11] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[12] James Henderson,et al. Simple but effective techniques to reduce biases , 2019, ArXiv.
[13] Haohan Wang,et al. Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual , 2019, EMNLP.
[14] Yi Li,et al. REPAIR: Removing Representation Bias by Dataset Resampling , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Yi Li,et al. RESOUND: Towards Action Recognition Without Representation Bias , 2018, ECCV.
[16] Math Bollen,et al. Understanding Power Quality Problems: Voltage Sags and Interruptions , 1999 .
[17] James Y. Zou,et al. Data Shapley: Equitable Valuation of Data for Machine Learning , 2019, ICML.
[18] Chitta Baral,et al. Our Evaluation Metric Needs an Update to Encourage Generalization , 2020, ArXiv.
[19] Eduard Hovy,et al. Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2020, ICLR.
[20] Alexei A. Efros,et al. Unbiased look at dataset bias , 2011, CVPR 2011.
[21] Yejin Choi,et al. SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.
[22] Omer Levy,et al. Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.
[23] Yejin Choi,et al. The Effect of Different Writing Tasks on Linguistic Style: A Case Study of the ROC Story Cloze Task , 2017, CoNLL.
[24] Luke Zettlemoyer,et al. Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases , 2019, EMNLP.
[25] K. Grunert. Food quality and safety: consumer perception and demand , 2005 .
[26] Zachary C. Lipton,et al. How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks , 2018, EMNLP.
[27] Math Bollen,et al. Understanding Power Quality Problems , 1999 .
[28] Roy Schwartz,et al. Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets , 2019, NAACL.