A Survey on Measuring and Mitigating Reasoning Shortcuts in Machine Reading Comprehension
暂无分享,去创建一个
[1] Maxime Cordy,et al. How do humans perceive adversarial text? A reality check on the validity and naturalness of word-based adversarial attacks , 2023, ACL.
[2] Wayne Xin Zhao,et al. A Survey of Large Language Models , 2023, ArXiv.
[3] William Yang Wang,et al. STREET: A Multi-Task Structured Reasoning and Explanation Benchmark , 2023, ICLR.
[4] Akiko Aizawa,et al. Analyzing the Effectiveness of the Underlying Reasoning Tasks in Multi-hop Question Answering , 2023, FINDINGS.
[5] Phong Nguyen-Thuan Do,et al. The Impacts of Unanswerable Questions on the Robustness of Machine Reading Comprehension Models , 2023, EACL.
[6] Fei Huang,et al. Reasoning with Language Model Prompting: A Survey , 2022, ACL.
[7] Akiko Aizawa,et al. How Well Do Multi-hop Reading Comprehension Models Understand Date Information? , 2022, AACL.
[8] Mengnan Du,et al. Shortcut Learning of Large Language Models in Natural Language Understanding , 2022, Commun. ACM.
[9] Danqi Chen,et al. Can Rationalization Improve Robustness? , 2022, NAACL.
[10] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.
[11] Xuezhi Wang,et al. Measure and Improve Robustness in NLP Models: A Survey , 2021, NAACL.
[12] Yonghong Yan,et al. Decomposing Complex Questions Makes Multi-Hop QA Easier and More Interpretable , 2021, EMNLP.
[13] Xuezhi Wang,et al. Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models , 2021, NAACL-HLT.
[14] D. Wang,et al. More Than Reading Comprehension: A Survey on Datasets and Metrics of Textual Question Answering , 2021, ArXiv.
[15] Alessandra Russo,et al. Numerical reasoning in machine reading comprehension tasks: are we there yet? , 2021, EMNLP.
[16] Abbas Ghaddar,et al. End-to-End Self-Debiasing Framework for Robust NLU Training , 2021, FINDINGS.
[17] Ashish Sabharwal,et al. ♫ MuSiQue: Multihop Questions via Single-hop Question Composition , 2021, TACL.
[18] Jonathan Berant,et al. Break, Perturb, Build: Automatic Perturbation of Reasoning Paths Through Question Decomposition , 2021, TACL.
[19] Matt Gardner,et al. QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension , 2021, ACM Comput. Surv..
[20] Chelsea Finn,et al. Just Train Twice: Improving Group Robustness without Training Group Information , 2021, ICML.
[21] Seung-won Hwang,et al. Robustifying Multi-hop QA through Pseudo-Evidentiality Training , 2021, ACL.
[22] Dongyan Zhao,et al. Why Machine Reading Comprehension Models Learn Shortcuts? , 2021, FINDINGS.
[23] Nai Ding,et al. Using Adversarial Attacks to Reveal the Statistical Bias in Machine Reading Comprehension Models , 2021, ACL.
[24] Eduard Hovy,et al. A Survey of Data Augmentation Approaches for NLP , 2021, FINDINGS.
[25] S. Riedel,et al. Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation , 2021, EMNLP.
[26] Oyvind Tafjord,et al. Explaining Answers with Entailment Trees , 2021, EMNLP.
[27] Zhiyi Ma,et al. Dynabench: Rethinking Benchmarking in NLP , 2021, NAACL.
[28] Jonathan Berant,et al. Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies , 2021, Transactions of the Association for Computational Linguistics.
[29] Goran Nenadic,et al. Semantics Altering Modifications for Evaluating Comprehension in Machine Reading , 2020, AAAI.
[30] Yonatan Belinkov,et al. Learning from others' mistakes: Avoiding dataset biases without modeling them , 2020, ICLR.
[31] Akiko Aizawa,et al. Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps , 2020, COLING.
[32] Iryna Gurevych,et al. Improving QA Generalization by Concurrent Modeling of Multiple Biases , 2020, FINDINGS.
[33] Yu Cheng,et al. InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective , 2020, ICLR.
[34] Iryna Gurevych,et al. Towards Debiasing NLU Models from Unknown Biases , 2020, EMNLP.
[35] Yejin Choi,et al. Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics , 2020, EMNLP.
[36] Sebastian Riedel,et al. Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets , 2020, EACL.
[37] Goran Nenadic,et al. Beyond Leaderboards: A survey of methods for revealing weaknesses in Natural Language Inference data and models , 2020, ArXiv.
[38] Sameer Singh,et al. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList , 2020, ACL.
[39] Jennifer Chu-Carroll,et al. To Test Machine Comprehension, Start by Defining Comprehension , 2020, ACL.
[40] Ashish Sabharwal,et al. Is Multihop QA in DiRe Condition? Measuring and Reducing Disconnected Reasoning , 2020, EMNLP.
[41] Hannaneh Hajishirzi,et al. UnifiedQA: Crossing Format Boundaries With a Single QA System , 2020, FINDINGS.
[42] Iryna Gurevych,et al. Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance , 2020, ACL.
[43] Jaewoo Kang,et al. Look at the First Sentence: Position Bias in Question Answering , 2020, EMNLP.
[44] Ting Liu,et al. Benchmarking Robustness of Machine Reading Comprehension Models , 2020, FINDINGS.
[45] Benjamin Recht,et al. The Effect of Natural Distribution Shift on Question Answering Models , 2020, ICML.
[46] Jianfeng Gao,et al. Adversarial Training for Large Neural Language Models , 2020, ArXiv.
[47] M. Bethge,et al. Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.
[48] Daniel Khashabi,et al. More Bang for Your Buck: Natural Perturbation for Robust Question Answering , 2020, EMNLP.
[49] Noah A. Smith,et al. Evaluating Models’ Local Decision Boundaries via Contrast Sets , 2020, FINDINGS.
[50] Amir Saffari,et al. What Do Models Learn from Question Answering Datasets? , 2020, EMNLP.
[51] John X. Morris,et al. Reevaluating Adversarial Examples in Natural Language , 2020, FINDINGS.
[52] Goran Nenadic,et al. A Framework for Evaluation of Machine Reading Comprehension Gold Standards , 2020, LREC.
[53] Hwee Tou Ng,et al. Do Multi-Hop Question Answering Systems Know How to Answer the Single-Hop Sub-Questions? , 2020, EACL.
[54] Kyunghyun Cho,et al. Unsupervised Question Decomposition for Question Answering , 2020, EMNLP.
[55] Ronan Le Bras,et al. Adversarial Filters of Dataset Biases , 2020, ICML.
[56] Sebastian Riedel,et al. Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension , 2020, Transactions of the Association for Computational Linguistics.
[57] Daniel Deutch,et al. Break It Down: A Question Understanding Benchmark , 2020, TACL.
[58] Hossein Amirkhani,et al. A Survey on Machine Reading Comprehension Systems , 2020, Natural Language Engineering.
[59] Kentaro Inui,et al. Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets , 2019, AAAI.
[60] Eric Nyberg,et al. Bend but Don’t Break? Multi-Challenge Stress Test for QA Models , 2019, EMNLP.
[61] Shuohang Wang,et al. What does BERT Learn from Multiple-Choice Reading Comprehension Datasets? , 2019, ArXiv.
[62] Kentaro Inui,et al. R4C: A Benchmark for Evaluating RC Systems to Get the Right Answer for the Right Reason , 2019, ACL.
[63] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[64] Luke Zettlemoyer,et al. Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases , 2019, EMNLP.
[65] Sameer Singh,et al. Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.
[66] Regina Barzilay,et al. Towards Debiasing Fact Verification Models , 2019, EMNLP.
[67] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.
[68] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[69] Weiming Zhang,et al. Neural Machine Reading Comprehension: Methods and Trends , 2019, Applied Sciences.
[70] Hwee Tou Ng,et al. Improving the Robustness of Question Answering Systems to Question Paraphrasing , 2019, ACL.
[71] Sameer Singh,et al. Are Red Roses Red? Evaluating Consistency of Question-Answering Models , 2019, ACL.
[72] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[73] Hannaneh Hajishirzi,et al. Multi-hop Reading Comprehension through Question Decomposition and Rescoring , 2019, ACL.
[74] Sameer Singh,et al. Compositional Questions Do Not Necessitate Multi-hop Reasoning , 2019, ACL.
[75] Mohit Bansal,et al. Avoiding Reasoning Shortcuts: Adversarial Evaluation, Training, and Model Development for Multi-Hop QA , 2019, ACL.
[76] Simon Ostermann,et al. MCScript2.0: A Machine Comprehension Corpus Focused on Script Events and Participants , 2019, *SEMEVAL.
[77] Roy Schwartz,et al. Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets , 2019, NAACL.
[78] Gabriel Stanovsky,et al. DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.
[79] Claire Cardie,et al. DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension , 2019, TACL.
[80] Quan Z. Sheng,et al. Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey , 2019 .
[81] Przemyslaw Biecek,et al. Are you tough enough? Framework for Robustness Validation of Machine Comprehension Systems , 2018, ArXiv.
[82] Yoshua Bengio,et al. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.
[83] Kentaro Inui,et al. What Makes Reading Comprehension Questions Easier? , 2018, EMNLP.
[84] Danqi Chen,et al. CoQA: A Conversational Question Answering Challenge , 2018, TACL.
[85] Eunsol Choi,et al. QuAC: Question Answering in Context , 2018, EMNLP.
[86] Yejin Choi,et al. SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.
[87] Zachary C. Lipton,et al. How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks , 2018, EMNLP.
[88] Yoshihiko Hayashi,et al. Answerable or Not: Devising a Dataset for Extending Machine Reading Comprehension , 2018, COLING.
[89] Carlos Guestrin,et al. Semantically Equivalent Adversarial Rules for Debugging NLP models , 2018, ACL.
[90] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[91] Dan Roth,et al. Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences , 2018, NAACL.
[92] Mohit Bansal,et al. Robust Machine Comprehension Models via Adversarial Training , 2018, NAACL.
[93] Simon Ostermann,et al. MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge , 2018, LREC.
[94] Sebastian Riedel,et al. Constructing Datasets for Multi-hop Reading Comprehension Across Documents , 2017, TACL.
[95] Percy Liang,et al. Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.
[96] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.
[97] Guokun Lai,et al. RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.
[98] Philip Bachman,et al. NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.
[99] Ali Farhadi,et al. Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.
[100] David A. McAllester,et al. Who did What: A Large-Scale Person-Centered Cloze Dataset , 2016, EMNLP.
[101] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[102] Regina Barzilay,et al. Rationalizing Neural Predictions , 2016, EMNLP.
[103] Marco Tulio Ribeiro,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, HLT-NAACL Demos.
[104] Jason Weston,et al. The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations , 2015, ICLR.
[105] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.
[106] Jason Weston,et al. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.
[107] Matthew Richardson,et al. MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text , 2013, EMNLP.
[108] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.
[109] J. Schilperoord,et al. Linguistics , 1999 .
[110] Emmanouel A. Varvarigos,et al. Survey , 2016, ACM Comput. Surv..
[111] G. Lapalme,et al. Unsupervised multiple-choice question generation for out-of-domain Q&A fine-tuning , 2022, ACL.
[112] Timothy J. Hazen,et al. Increasing Robustness to Spurious Correlations using Forgettable Examples , 2021, EACL.
[113] Akiko Aizawa,et al. Benchmarking Machine Reading Comprehension: A Psychological Perspective , 2021, EACL.
[114] Viktor Schlegel,et al. Is the Understanding of Explicit Discourse Relations Required in Machine Reading Comprehension? , 2021, EACL.
[115] Ana Marasovi'c,et al. Teach Me to Explain: A Review of Datasets for Explainable NLP , 2021, ArXiv.
[116] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[117] Jifan Chen,et al. Understanding Dataset Design Choices for Multi-hop Reasoning , 2019, NAACL.
[118] Danqi Chen. Neural reading comprehension and beyond , 2018 .
[119] Lucia Specia,et al. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , 2017, EMNLP 2017.
[120] Marco Tulio Ribeiro,et al. “ Why Should I Trust You ? ” Explaining the Predictions of Any Classifier , 2016 .