Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks
暂无分享,去创建一个
Boyuan Chen | Jacob Andreas | Najoung Kim | Yoon Kim | Bailin Wang | Linlu Qiu | Ekin Akyürek | Zhaofeng Wu | Alexis Ross
[1] Nicolas Le Roux,et al. Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference , 2023, ArXiv.
[2] Khyathi Raghavi Chandu,et al. How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources , 2023, ArXiv.
[3] Akiko Aizawa,et al. Probing Physical Reasoning with Counter-Commonsense Context , 2023, ACL.
[4] Neel Joshi,et al. Controllable Text-to-Image Generation with GPT-4 , 2023, ArXiv.
[5] Ronan Le Bras,et al. Faith and Fate: Limits of Transformers on Compositionality , 2023, ArXiv.
[6] Song-Chun Zhu,et al. Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners , 2023, ArXiv.
[7] Shay B. Cohen,et al. The Larger They Are, the Harder They Fail: Language Models do not Recognize Identifier Swaps in Python , 2023, ACL.
[8] Ashish Sabharwal,et al. IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions , 2023, EMNLP.
[9] Yilun Du,et al. Improving Factuality and Reasoning in Language Models through Multiagent Debate , 2023, ArXiv.
[10] R. Levy,et al. Prompt-based methods may underestimate large language models' linguistic generalizations , 2023, ArXiv.
[11] Emre Kıcıman,et al. Causal Reasoning and Large Language Models: Opening a New Frontier for Causality , 2023, ArXiv.
[12] Z. Ren,et al. Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent , 2023, ArXiv.
[13] Marco Tulio Ribeiro,et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4 , 2023, ArXiv.
[14] Anton Firc,et al. On the Educational Impact of ChatGPT: Is Artificial Intelligence Ready to Obtain a University Degree? , 2023, ITiCSE.
[15] S. S. Gill,et al. Mind meets machine: Unravelling GPT-4's cognitive psychology , 2023, BenchCouncil Transactions on Benchmarks, Standards and Evaluations.
[16] Shay B. Cohen,et al. BERT Is Not The Count: Learning to Match Mathematical Statements with Proofs , 2023, EACL.
[17] Anders Søgaard,et al. Implications of the Convergence of Language and Vision Model Geometries , 2023, ArXiv.
[18] Anders Søgaard. Grounding the Vector Space of an Octopus: Word Meaning from Raw Text , 2023, Minds and Machines.
[19] J. Petke,et al. An Analysis of the Automatic Bug Fixing Performance of ChatGPT , 2023, 2023 IEEE/ACM International Workshop on Automated Program Repair (APR).
[20] Anna A. Ivanova,et al. Dissociating language and thought in large language models: a cognitive perspective , 2023, ArXiv.
[21] Bowen Zhang,et al. How would Stance Detection Techniques Evolve after the Launch of ChatGPT? , 2022, ArXiv.
[22] William W. Cohen,et al. Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks , 2022, ArXiv.
[23] David Bau,et al. Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task , 2022, ICLR.
[24] He He,et al. Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought , 2022, ICLR.
[25] Yuhuai Wu,et al. Solving Quantitative Reasoning Problems with Language Models , 2022, NeurIPS.
[26] S. Gu,et al. Large Language Models are Zero-Shot Reasoners , 2022, NeurIPS.
[27] T. Poibeau,et al. Probing for the Usage of Grammatical Number , 2022, ACL.
[28] Tom B. Brown,et al. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback , 2022, ArXiv.
[29] Jacob Eisenstein. Informativeness and Invariance: Two Perspectives on Spurious Correlations in Natural Language , 2022, NAACL.
[30] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[31] D. Schuurmans,et al. Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ICLR.
[32] Huan Sun,et al. Iteratively Prompt Pre-trained Language Models for Chain of Thought , 2022, EMNLP.
[33] Roy Schwartz,et al. Data Contamination: From Memorization to Exploitation , 2022, ACL.
[34] Frank F. Xu,et al. A systematic evaluation of large language models of code , 2022, MAPS@PLDI.
[35] Jane A. Yu,et al. Quantifying Adaptability in Pre-trained Language Models with 500 Tasks , 2021, NAACL.
[36] Noah D. Goodman,et al. Inducing Causal Structure for Interpretable Neural Networks , 2021, ICML.
[37] David Bieber,et al. Show Your Work: Scratchpads for Intermediate Computation with Language Models , 2021, ArXiv.
[38] Anders Sogaard,et al. Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color , 2021, CONLL.
[39] Tiago Pimentel,et al. A Bayesian Framework for Information-Theoretic Probing , 2021, EMNLP.
[40] Anna Maria Di Sciullo. On Aspects of the Theory of Syntax , 2021, Inference: International Review of Science.
[41] Noah D. Goodman,et al. A counterfactual simulation model of causal judgments for physical events. , 2021, Psychological review.
[42] Christopher Potts,et al. Causal Abstractions of Neural Networks , 2021, NeurIPS.
[43] Jacob Andreas,et al. Implicit Representations of Meaning in Neural Language Models , 2021, ACL.
[44] Abulhair Saparov,et al. Towards General Natural Language Understanding with Probabilistic Worldbuilding , 2021, TACL.
[45] Hannaneh Hajishirzi,et al. Cross-Task Generalization via Natural Language Crowdsourcing Instructions , 2021, ACL.
[46] Jesse Dodge,et al. Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus , 2021, EMNLP.
[47] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[48] Yonatan Belinkov,et al. Probing Classifiers: Promises, Shortcomings, and Advances , 2021, CL.
[49] Laria Reynolds,et al. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm , 2021, CHI Extended Abstracts.
[50] Charles Foster,et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.
[51] Peter Clark,et al. ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language , 2020, FINDINGS.
[52] Colin Raffel,et al. Extracting Training Data from Large Language Models , 2020, USENIX Security Symposium.
[53] Yoav Goldberg,et al. Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals , 2020, Transactions of the Association for Computational Linguistics.
[54] Leon Bergen,et al. Word Frequency Does Not Predict Grammatical Knowledge in Language Models , 2020, EMNLP.
[55] Dan Roth,et al. Do Language Embeddings capture Scales? , 2020, BLACKBOXNLP.
[56] Zachary Chase Lipton,et al. Explaining The Efficacy of Counterfactually-Augmented Data , 2020, ICLR.
[57] Ulrich Paquet,et al. Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess , 2020, ArXiv.
[58] Tie-Yan Liu,et al. PopMAG: Pop Music Accompaniment Generation , 2020, ACM Multimedia.
[59] Stan Matwin,et al. SemEval-2020 Task 5: Counterfactual Recognition , 2020, SEMEVAL.
[60] J. Tenenbaum,et al. Bayesian Models of Conceptual Development: Learning as Building Models of the World , 2020, Annual Review of Developmental Psychology.
[61] Emily M. Bender,et al. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data , 2020, ACL.
[62] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[63] Ali Farhadi,et al. Probing Contextual Language Models for Common Ground with Visual Representations , 2020, NAACL.
[64] Marco Baroni,et al. Syntactic Structure from Deep Learning , 2020, Annual Review of Linguistics.
[65] Noah A. Smith,et al. Evaluating Models’ Local Decision Boundaries via Contrast Sets , 2020, FINDINGS.
[66] Oyvind Tafjord,et al. Transformers as Soft Reasoners over Language , 2020, IJCAI.
[67] Zachary Chase Lipton,et al. Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2019, ICLR.
[68] Yejin Choi,et al. Counterfactual Story Reasoning and Generation , 2019, EMNLP.
[69] John Hewitt,et al. Designing and Interpreting Probes with Control Tasks , 2019, EMNLP.
[70] Allyson Ettinger,et al. What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models , 2019, TACL.
[71] Leon Hong,et al. Approachable Music Composition with Machine Learning at Scale , 2019, ISMIR.
[72] Douglas Eck,et al. Counterpoint by Convolution , 2019, ISMIR.
[73] Yoav Goldberg,et al. Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages , 2019, NAACL.
[74] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.
[75] Noah D. Goodman,et al. Eye-Tracking Causality , 2017, Psychological science.
[76] Ilya Shpitser,et al. Fair Inference on Outcomes , 2017, AAAI.
[77] Matt J. Kusner,et al. Counterfactual Fairness , 2017, NIPS.
[78] Sampo Pyysalo,et al. Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.
[79] Christopher D. Manning,et al. Enhanced English Universal Dependencies: An Improved Representation for Natural Language Understanding Tasks , 2016, LREC.
[80] Julian N. Marewski,et al. What can the brain teach us about building artificial intelligence? , 2016, Behavioral and Brain Sciences.
[81] Ro'i Zultan,et al. Causal Responsibility and Counterfactuals , 2013, Cogn. Sci..
[82] Ben Coleman,et al. Game, Set, Math , 2012 .
[83] Diane Maclagan,et al. The card game set , 2003 .
[84] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[85] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.
[86] S. Vandenberg,et al. Mental Rotations, a Group Test of Three-Dimensional Spatial Visualization , 1978, Perceptual and motor skills.
[87] R. Shepard,et al. Mental Rotation of Three-Dimensional Objects , 1971, Science.
[88] John McCarthy,et al. Programs with common sense , 1960 .
[89] Ellie Pavlick,et al. Mapping Language Models to Grounded Conceptual Spaces , 2022, ICLR.
[90] Matt Gardner,et al. Impact of Pretraining Term Frequencies on Few-Shot Numerical Reasoning , 2022, EMNLP.
[91] A. D'Amour,et al. Counterfactual Invariance to Spurious Correlations in Text Classification , 2021, NeurIPS.
[92] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[93] M. Kenward,et al. An Introduction to the Bootstrap , 2007 .
[94] Kamalika Chaudhuri,et al. ON THE COMPLEXITY OF THE GAME OF SET , 2003 .
[95] Danqi Chen,et al. of the Association for Computational Linguistics: , 2001 .
[96] Irene Heim,et al. Semantics in generative grammar , 1998 .
[97] John R. Anderson,et al. The Transfer of Cognitive Skill , 1989 .