Large Language Models Can Be Easily Distracted by Irrelevant Context
暂无分享,去创建一个
Xinyun Chen | David Dohan | Denny Zhou | Nathan Scales | Nathanael Scharli | Kanishka Misra | Freda Shi | E. Chi | Freda Shi
[1] Luke Zettlemoyer,et al. Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters , 2022, ACL.
[2] William W. Cohen,et al. Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks , 2022, ArXiv.
[3] Jamie Callan,et al. PAL: Program-aided Language Models , 2022, ICML.
[4] Christopher D. Manning,et al. Holistic Evaluation of Language Models , 2023, Annals of the New York Academy of Sciences.
[5] M. Zaheer,et al. Large Language Models with Controllable Working Memory , 2022, ACL.
[6] Matt Gardner,et al. CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation , 2022, EMNLP.
[7] Andrew M. Dai,et al. Scaling Instruction-Finetuned Language Models , 2022, ArXiv.
[8] Quoc V. Le,et al. Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them , 2022, ACL.
[9] Noah A. Smith,et al. Measuring and Narrowing the Compositionality Gap in Language Models , 2022, ArXiv.
[10] Hyung Won Chung,et al. Language Models are Multilingual Chain-of-Thought Reasoners , 2022, ICLR.
[11] Allyson Ettinger,et al. COMPS: Conceptual Minimal Pair Sentences for testing Robust Property Knowledge and its Inheritance in Pre-trained Language Models , 2022, EACL.
[12] Ashish Sabharwal,et al. Decomposed Prompting: A Modular Approach for Solving Complex Tasks , 2022, ICLR.
[13] Xinyun Chen,et al. Compositional Semantic Parsing with Large Language Models , 2022, ArXiv.
[14] Aman Madaan,et al. Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango , 2022, ArXiv.
[15] Shafiq R. Joty,et al. FOLIO: Natural Language Reasoning with First-Order Logic , 2022, ArXiv.
[16] Raphael Gontijo Lopes,et al. Language Model Cascades , 2022, ArXiv.
[17] D. Schuurmans,et al. Rationale-Augmented Ensembles in Language Models , 2022, ArXiv.
[18] Kang Min Yoo,et al. Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations , 2022, EMNLP.
[19] S. Gu,et al. Large Language Models are Zero-Shot Reasoners , 2022, NeurIPS.
[20] I. Higgins,et al. Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning , 2022, ICLR.
[21] Sida I. Wang,et al. Natural Language to Code Translation with Execution , 2022, EMNLP.
[22] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[23] D. Schuurmans,et al. Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ICLR.
[24] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[25] M. Lewis,et al. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? , 2022, Conference on Empirical Methods in Natural Language Processing.
[26] J. Steinhardt,et al. Capturing Failures of Large Language Models via Human Cognitive Biases , 2022, NeurIPS.
[27] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.
[28] David Bieber,et al. Show Your Work: Scratchpads for Intermediate Computation with Language Models , 2021, ArXiv.
[29] Zhe Gan,et al. Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models , 2021, NeurIPS Datasets and Benchmarks.
[30] Mohammad Bavarian,et al. Training Verifiers to Solve Math Word Problems , 2021, ArXiv.
[31] Alexander M. Rush,et al. Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.
[32] Allyson Ettinger,et al. Sorting through the noise: Testing robustness of information processing in pre-trained language models , 2021, EMNLP.
[33] Vikram Pudi,et al. Adversarial Examples for Evaluating Math Word Problem Solvers , 2021, EMNLP.
[34] Quoc V. Le,et al. Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.
[35] Ellie Pavlick,et al. Do Prompt-Based Models Really Understand the Meaning of Their Prompts? , 2021, NAACL.
[36] Charles Sutton,et al. Program Synthesis with Large Language Models , 2021, ArXiv.
[37] Navin Goyal,et al. Are NLP Models really able to Solve Simple Math Word Problems? , 2021, NAACL.
[38] Peter Clark,et al. ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language , 2020, FINDINGS.
[39] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[40] John X. Morris,et al. TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP , 2020, EMNLP.
[41] Oyvind Tafjord,et al. Transformers as Soft Reasoners over Language , 2020, IJCAI.
[42] Hinrich Schütze,et al. Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly , 2019, ACL.
[43] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[44] Joelle Pineau,et al. CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text , 2019, EMNLP.
[45] Gabriel Stanovsky,et al. DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.
[46] Weizhu Chen,et al. Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question Answering , 2018, NAACL.
[47] Yuning Jiang,et al. Learning Visually-Grounded Semantics from Contrastive Adversarial Samples , 2018, COLING.
[48] Dan Roth,et al. Learning What is Essential in Questions , 2017, CoNLL.
[49] Percy Liang,et al. Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.
[50] Wang Ling,et al. Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems , 2017, ACL.
[51] Jason Weston,et al. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.
[52] Daniela Lucangeli,et al. The Disturbing Effect of Irrelevant Information on Arithmetic Problem Solving in Inattentive Children , 2002, Developmental neuropsychology.
[53] Cesare Cornoldi,et al. Working memory and intrusions of irrelevant information in a group of specific poor problem solvers , 1999, Memory & cognition.
[54] W J Hoyer,et al. Effects of varying irrelevant information on adult age differences in problem solving. , 1979, Journal of gerontology.
[55] Allyson Ettinger,et al. COMPS: Conceptual Minimal Pair Sentences for testing Property Knowledge and Inheritance in Pre-trained Language Models , 2022, ArXiv.
[56] R. Chaves,et al. Look at that! BERT can be easily distracted from paying attention to morphosyntax , 2021, SCIL.
[57] Danqi Chen,et al. of the Association for Computational Linguistics: , 2001 .