What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks?
暂无分享,去创建一个
Clara Vania | Samuel R. Bowman | Saku Sugawara | Nikita Nangia | Alex Warstadt | Harsh Trivedi | Nikita Nangia | Clara Vania | Alex Warstadt | Sam Bowman | H. Trivedi | Saku Sugawara
[1] Aniket Kittur,et al. CrowdForge: crowdsourcing complex work , 2011, UIST.
[2] George A. Akerlof,et al. The Market for “Lemons”: Quality Uncertainty and the Market Mechanism , 1970 .
[3] Gabriel Stanovsky,et al. DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.
[4] David Boud,et al. Enhancing learning through self assessment , 1995 .
[5] Ellie Pavlick,et al. Inherent Disagreements in Human Textual Inferences , 2019, Transactions of the Association for Computational Linguistics.
[6] Ido Dagan,et al. Controlled Crowdsourcing for High-Quality QA-SRL Annotation , 2019, ACL.
[7] Nanyun Peng,et al. TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions , 2020, EMNLP.
[8] Guokun Lai,et al. RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.
[9] Yoav Goldberg,et al. Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets , 2019, EMNLP.
[10] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[11] Jonathan Berant,et al. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.
[12] Sebastian Riedel,et al. Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension , 2020, Transactions of the Association for Computational Linguistics.
[13] Zachary C. Lipton,et al. How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks , 2018, EMNLP.
[14] Vikas Sindhwani,et al. Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria , 2009, HLT-NAACL 2009.
[15] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[16] Ali Farhadi,et al. HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.
[17] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[18] Mohit Bansal,et al. Evaluating Interactive Summarization: an Expansion-Based Framework , 2020, ArXiv.
[19] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[20] Eunsol Choi,et al. QuAC: Question Answering in Context , 2018, EMNLP.
[21] Mark D. Reckase,et al. Item Response Theory: Parameter Estimation Techniques , 1998 .
[22] Anna Rumshisky,et al. Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks , 2020, AAAI.
[23] Hannaneh Hajishirzi,et al. UnifiedQA: Crossing Format Boundaries With a Single QA System , 2020, FINDINGS.
[24] Changjian Chen,et al. An Interactive Method to Improve Crowdsourced Annotations , 2019, IEEE Transactions on Visualization and Computer Graphics.
[25] Michael S. Bernstein,et al. Analytic Methods for Optimizing Realtime Crowdsourcing , 2012, ArXiv.
[26] Scott R. Klemmer,et al. Shepherding the crowd: managing and providing feedback to crowd workers , 2011, CHI Extended Abstracts.
[27] Stefan Dietze,et al. Using Worker Self-Assessments for Competence-Based Pre-Selection in Crowdsourcing Microtasks , 2017, ACM Trans. Comput. Hum. Interact..
[28] Jennifer Wortman Vaughan. Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research , 2017, J. Mach. Learn. Res..
[29] Noah A. Smith,et al. Evaluating Models’ Local Decision Boundaries via Contrast Sets , 2020, FINDINGS.
[30] Lydia B. Chilton,et al. MicroTalk: Using Argumentation to Improve Crowdsourcing Accuracy , 2016, HCOMP.
[31] Aniket Kittur,et al. Crowdsourcing user studies with Mechanical Turk , 2008, CHI.
[32] Rachel Rudinger,et al. Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.
[33] Nancy Ide,et al. Integrating Linguistic Resources: The American National Corpus Model , 2006, LREC.
[34] Mohit Bansal,et al. Adversarial NLI: A New Benchmark for Natural Language Understanding , 2020, ACL.
[35] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.
[36] Lawrence S. Moss,et al. OCNLI: Original Chinese Natural Language Inference , 2020, FINDINGS.
[37] Klaus Krippendorff,et al. Content Analysis: An Introduction to Its Methodology , 1980 .
[38] Xiang Zhou,et al. What Can We Learn from Collective Human Opinions on Natural Language Inference Data? , 2020, EMNLP.
[39] Scott R. Klemmer,et al. Shepherding the crowd yields better work , 2012, CSCW.
[40] Ryan Cotterell,et al. A Multi-Dialect, Multi-Genre Corpus of Informal Written Arabic , 2014, LREC.
[41] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[42] Samuel R. Bowman,et al. When Do You Need Billions of Words of Pretraining Data? , 2020, ACL.
[43] Yejin Choi,et al. Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning , 2019, EMNLP.
[44] Chris Callison-Burch,et al. Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk , 2009, EMNLP.
[45] Peter Clark,et al. Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , 2018, EMNLP.
[46] Hector J. Levesque,et al. The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.
[47] Claire Cardie,et al. Improving Machine Reading Comprehension with General Reading Strategies , 2018, NAACL.
[48] Min-Yen Kan,et al. Perspectives on crowdsourcing annotations for natural language processing , 2012, Language Resources and Evaluation.
[49] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.
[50] Adolfo Martínez Usó,et al. Item response theory in AI: Analysing machine learning classifiers at the instance level , 2019, Artif. Intell..
[51] Hao Wu,et al. Building an Evaluation Scale using Item Response Theory , 2016, EMNLP.
[52] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[53] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[54] Omer Levy,et al. Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.
[55] Jesse Chandler,et al. Risks and Rewards of Crowdsourcing Marketplaces , 2014, Handbook of Human Computation.
[56] Hao Wu,et al. Learning Latent Parameters without Human Response Patterns: Item Response Theory with Artificial Crowds , 2019, EMNLP.
[57] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.
[58] Hadas Kotek,et al. Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution , 2020, COLING.