Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension
暂无分享,去创建一个
[1] Sebastian Riedel,et al. Constructing Datasets for Multi-hop Reading Comprehension Across Documents , 2017, TACL.
[2] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[3] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[4] Luke S. Zettlemoyer,et al. AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.
[5] Eric Wallace,et al. Trick Me If You Can: Human-in-the-Loop Generation of Adversarial Examples for Question Answering , 2018, TACL.
[6] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.
[7] Yejin Choi,et al. SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.
[8] Zachary C. Lipton,et al. How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks , 2018, EMNLP.
[9] Yejin Choi,et al. The Effect of Different Writing Tasks on Linguistic Style: A Case Study of the ROC Story Cloze Task , 2017, CoNLL.
[10] Eduard Hovy,et al. Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2020, ICLR.
[11] William A. Gale,et al. A sequential algorithm for training text classifiers , 1994, SIGIR '94.
[12] Doug Downey,et al. CODAH: An Adversarially Authored Question-Answer Dataset for Common Sense , 2019, ArXiv.
[13] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[14] Christos Christodoulopoulos,et al. The FEVER2.0 Shared Task , 2019, EMNLP.
[15] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.
[16] Matthew Richardson,et al. MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text , 2013, EMNLP.
[17] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.
[18] Dirk Weissenborn,et al. Making Neural QA as Simple as Possible but not Simpler , 2017, CoNLL.
[19] Daniel Jurafsky,et al. Distant supervision for relation extraction without labeled data , 2009, ACL.
[20] Yoshua Bengio,et al. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.
[21] Emily M. Bender,et al. Towards Linguistically Generalizable NLP Systems: A Workshop and Shared Task , 2017, Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems.
[22] Xiaodong Liu,et al. ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension , 2018, ArXiv.
[23] Ali Farhadi,et al. HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.
[24] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[25] Mohit Bansal,et al. Adversarial NLI: A New Benchmark for Natural Language Understanding , 2020, ACL.
[26] Jianfeng Gao,et al. A Human Generated MAchine Reading COmprehension Dataset , 2018 .
[27] Jason Weston,et al. Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack , 2019, EMNLP.
[28] Oren Etzioni,et al. Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , 2018, ArXiv.
[29] Percy Liang,et al. Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.
[30] Ali Farhadi,et al. Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.
[31] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[32] Richard Socher,et al. Efficient and Robust Question Answering from Minimal Context over Documents , 2018, ACL.
[33] Omer Levy,et al. Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.
[34] Guillaume Bouchard,et al. Interpretation of Natural Language Rules in Conversational Machine Reading , 2018, EMNLP.
[35] Kentaro Inui,et al. What Makes Reading Comprehension Questions Easier? , 2018, EMNLP.
[36] Pushmeet Kohli,et al. Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles , 2018, ArXiv.
[37] Danqi Chen,et al. A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task , 2016, ACL.
[38] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.
[39] Roy Schwartz,et al. Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets , 2019, NAACL.
[40] Jason Weston,et al. Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent , 2017, ICLR.
[41] Danqi Chen,et al. CoQA: A Conversational Question Answering Challenge , 2018, TACL.
[42] Noah A. Smith,et al. Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning , 2019, EMNLP.
[43] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.
[44] Chris Dyer,et al. The NarrativeQA Reading Comprehension Challenge , 2017, TACL.
[45] Eunsol Choi,et al. QuAC: Question Answering in Context , 2018, EMNLP.
[46] Gabriel Stanovsky,et al. DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.
[47] Philip Bachman,et al. NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.
[48] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[49] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[50] Kentaro Inui,et al. Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets , 2019, AAAI.