Utility is in the Eye of the User: A Critique of NLP Leaderboards
暂无分享,去创建一个
[1] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[2] Jieyu Zhao,et al. Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.
[3] Timnit Gebru,et al. Datasheets for datasets , 2018, Commun. ACM.
[4] Roy Schwartz,et al. Show Your Work: Improved Reporting of Experimental Results , 2019, EMNLP.
[5] Quanlu Zhang,et al. LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression , 2020, COLING.
[6] Claire Cardie,et al. SemEval-2014 Task 10: Multilingual Semantic Textual Similarity , 2014, *SEMEVAL.
[7] Percy Liang,et al. Fairness Without Demographics in Repeated Loss Minimization , 2018, ICML.
[8] Andreas Moshovos,et al. GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[9] Xin Jiang,et al. DynaBERT: Dynamic BERT with Adaptive Width and Depth , 2020, NeurIPS.
[10] Luca Oneto,et al. Fairness in Machine Learning , 2020, INNSBDDL.
[11] Benjamin Recht,et al. The Effect of Natural Distribution Shift on Question Answering Models , 2020, ICML.
[12] Sanjeev Arora,et al. A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.
[13] Aditi Raghunathan,et al. Certified Robustness to Adversarial Word Substitutions , 2019, EMNLP.
[14] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[15] F. E.. Principles of Economics , 1890, Nature.
[16] Claire Cardie,et al. SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability , 2015, *SEMEVAL.
[17] Quan Z. Sheng,et al. Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey , 2019 .
[18] Kawin Ethayarajh,et al. Is Your Classifier Actually Biased? Measuring Fairness under Uncertainty with Bernstein Bounds , 2020, ACL.
[19] J. Rawls,et al. Justice as Fairness: A Restatement , 2001 .
[20] Moritz Hardt. Climbing a shaky ladder: Better adaptive risk estimation , 2017, ArXiv.
[21] Graeme Hirst,et al. Understanding Undesirable Word Embedding Associations , 2019, ACL.
[22] Inioluwa Deborah Raji,et al. Model Cards for Model Reporting , 2018, FAT.
[23] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[24] Percy Liang,et al. Distributionally Robust Language Modeling , 2019, EMNLP.
[25] Aditi Raghunathan,et al. Semidefinite relaxations for certifying robustness to adversarial examples , 2018, NeurIPS.
[26] Avrim Blum,et al. The Ladder: A Reliable Leaderboard for Machine Learning Competitions , 2015, ICML.
[27] Graeme Hirst,et al. Towards Understanding Linear Word Analogies , 2018, ACL.
[28] Kawin Ethayarajh,et al. Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline , 2018, Rep4NLP@ACL.
[29] Matt Crane,et al. Questionable Answers in Question Answering Research: Reproducibility and Variability of Published Results , 2018, TACL.
[30] J. Weston,et al. Adversarial NLI: A New Benchmark for Natural Language Understanding , 2019, ACL.
[31] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[32] Aditi Raghunathan,et al. Certified Defenses against Adversarial Examples , 2018, ICLR.
[33] Richard Socher,et al. The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.
[34] Emily M. Bender,et al. Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science , 2018, TACL.
[35] Rachel Rudinger,et al. Gender Bias in Coreference Resolution , 2018, NAACL.
[36] Shikha Bordia,et al. Identifying and Reducing Gender Bias in Word-Level Language Models , 2019, NAACL.
[37] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[38] Siva Reddy,et al. StereoSet: Measuring stereotypical bias in pretrained language models , 2020, ACL.
[39] Tal Linzen,et al. How Can We Accelerate Progress Towards Human-like Linguistic Generalization? , 2020, ACL.
[40] P. Samuelson. Consumption Theory in Terms of Revealed Preference , 1948 .
[41] Nathan Srebro,et al. Equality of Opportunity in Supervised Learning , 2016, NIPS.
[42] Solon Barocas,et al. Language (Technology) is Power: A Critical Survey of “Bias” in NLP , 2020, ACL.
[43] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.
[44] Alan W Black,et al. Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings , 2019, NAACL.
[45] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[46] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[47] Kawin Ethayarajh. Rotate King to get Queen: Word Relationships as Orthogonal Transformations in Embedding Space , 2019, EMNLP/IJCNLP.
[48] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.
[49] Beth M. Sundheim,et al. Overview of Results of the MUC-6 Evaluation , 1995, MUC.
[50] Oren Etzioni,et al. Green AI , 2019, Commun. ACM.
[51] Eneko Agirre,et al. *SEM 2013 shared task: Semantic Textual Similarity , 2013, *SEMEVAL.
[52] Percy Liang,et al. Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.