Semantics Altering Modifications for Evaluating Comprehension in Machine Reading
暂无分享,去创建一个
Goran Nenadic | Viktor Schlegel | Riza Batista-Navarro | G. Nenadic | R. Batista-Navarro | Viktor Schlegel
[1] Omer Levy,et al. Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.
[2] Yoshihiko Hayashi,et al. Answerable or Not: Devising a Dataset for Extending Machine Reading Comprehension , 2018, COLING.
[3] Ashish Sabharwal,et al. What Does My QA Model Know? Devising Controlled Probes Using Expert Knowledge , 2019, Transactions of the Association for Computational Linguistics.
[4] Emiel Krahmer,et al. Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation , 2017, J. Artif. Intell. Res..
[5] Christopher Joseph Pal,et al. Interactive Language Learning by Question Answering , 2019, EMNLP.
[6] Hector J. Levesque,et al. The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.
[7] Yoko Iyeiri,et al. Verbs of Implicit Negation and their Complements in the History of English , 2010 .
[8] Mohit Bansal,et al. Robust Machine Comprehension Models via Adversarial Training , 2018, NAACL.
[9] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[10] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[11] Reut Tsarfaty,et al. Evaluating NLP Models via Contrast Sets , 2020, ArXiv.
[12] Jason Weston,et al. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.
[13] Maria Aloni,et al. The Cambridge Handbook of Formal Semantics , 2016 .
[14] Eduard Hovy,et al. Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2020, ICLR.
[15] Mohit Bansal,et al. Avoiding Reasoning Shortcuts: Adversarial Evaluation, Training, and Model Development for Multi-Hop QA , 2019, ACL.
[16] Sameer Singh,et al. Are Red Roses Red? Evaluating Consistency of Question-Answering Models , 2019, ACL.
[17] Ido Dagan,et al. Annotating and Predicting Non-Restrictive Noun Phrase Modifications , 2016, ACL.
[18] Shi Feng,et al. Misleading Failures of Partial-input Baselines , 2019, ACL.
[19] Danielle S McNamara,et al. The tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text cohesion , 2015, Behavior Research Methods.
[20] Jonathan Berant,et al. MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension , 2019, ACL.
[21] Carlos Guestrin,et al. Semantically Equivalent Adversarial Rules for Debugging NLP models , 2018, ACL.
[22] Sebastian Riedel,et al. Constructing Datasets for Multi-hop Reading Comprehension Across Documents , 2017, TACL.
[23] Kyunghyun Cho,et al. SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine , 2017, ArXiv.
[24] Lauri Karttunen,et al. Simple and Phrasal Implicatives , 2012, *SEMEVAL.
[25] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[26] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[27] Yoshua Bengio,et al. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.
[28] Rachel Rudinger,et al. Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.
[29] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[30] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[31] R. Thomas McCoy,et al. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.
[32] Christian S. Jensen,et al. Modification , 1995, The TSQL2 Temporal Query Language.
[33] Kentaro Inui,et al. What Makes Reading Comprehension Questions Easier? , 2018, EMNLP.
[34] Yoav Goldberg,et al. Breaking NLI Systems with Sentences that Require Simple Lexical Inferences , 2018, ACL.
[35] Percy Liang,et al. Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.
[36] Philip Bachman,et al. NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.
[37] Mohit Bansal,et al. Analyzing Compositionality-Sensitivity of NLI Models , 2018, AAAI.
[38] Akiko Aizawa,et al. Evaluation Metrics for Machine Reading Comprehension: Prerequisite Skills and Readability , 2017, ACL.
[39] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[40] Roser Morante,et al. ConanDoyle-neg: Annotation of negation cues and their scope in Conan Doyle stories , 2012, LREC.
[41] Goran Nenadic,et al. Beyond Leaderboards: A survey of methods for revealing weaknesses in Natural Language Inference data and models , 2020, ArXiv.
[42] Matthew J. Hausknecht,et al. TextWorld: A Learning Environment for Text-based Games , 2018, CGW@IJCAI.
[43] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[44] Livio Robaldo,et al. The Penn Discourse TreeBank 2.0. , 2008, LREC.
[45] Goran Nenadic,et al. A Framework for Evaluation of Machine Reading Comprehension Gold Standards , 2020, LREC.
[46] Mihai Dascalu,et al. The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap , 2018, Behavior Research Methods.
[47] Wentao Ma,et al. Benchmarking Robustness of Machine Reading Comprehension Models , 2021, FINDINGS.
[48] Ali Farhadi,et al. Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.
[49] Christopher Potts,et al. Posing Fair Generalization Tasks for Natural Language Inference , 2019, EMNLP.
[50] Gabriel Stanovsky,et al. DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.