What Algorithms can Transformers Learn? A Study in Length Generalization
暂无分享,去创建一个
Samy Bengio | Preetum Nakkiran | Etai Littwin | O. Saremi | Noam Razin | Hattie Zhou | Arwen Bradley | Josh Susskind
[1] Pranjal Awasthi,et al. Improving Length-Generalization in Transformers via Task Hinting , 2023, arXiv.org.
[2] Eran Malach. Auto-Regressive Next-Token Predictors are Universal Learners , 2023, ArXiv.
[3] Boyuan Chen,et al. Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks , 2023, ArXiv.
[4] Max Tegmark,et al. The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks , 2023, arXiv.org.
[5] Siva Reddy,et al. The Impact of Positional Encoding on Length Generalization in Transformers , 2023, NeurIPS.
[6] Ronan Le Bras,et al. Faith and Fate: Limits of Transformers on Compositionality , 2023, NeurIPS.
[7] Mehdi Abbana Bennani,et al. Randomized Positional Encodings Boost Length Generalization of Transformers , 2023, ACL.
[8] Seyed Mehran Kazemi,et al. Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples , 2023, NeurIPS.
[9] Michael Hanna,et al. How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model , 2023, ArXiv.
[10] J. Steinhardt,et al. Progress measures for grokking via mechanistic interpretability , 2023, ICLR.
[11] Tom McGrath,et al. Tracr: Compiled Transformers as a Laboratory for Interpretability , 2023, NeurIPS.
[12] D. Schuurmans,et al. What learning algorithm is in-context learning? Investigations with linear models , 2022, ICLR.
[13] P. Blunsom,et al. Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions , 2022, ACL.
[14] Noah A. Smith,et al. Measuring and Narrowing the Compositionality Gap in Language Models , 2022, EMNLP.
[15] Tom B. Brown,et al. In-context Learning and Induction Heads , 2022, ArXiv.
[16] Aman Madaan,et al. Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango , 2022, ArXiv.
[17] M. Shanahan,et al. Faithful Reasoning Using Large Language Models , 2022, ArXiv.
[18] Percy Liang,et al. What Can Transformers Learn In-Context? A Case Study of Simple Function Classes , 2022, NeurIPS.
[19] Yuhuai Wu,et al. Exploring Length Generalization in Large Language Models , 2022, NeurIPS.
[20] Yuhuai Wu,et al. Solving Quantitative Reasoning Problems with Language Models , 2022, NeurIPS.
[21] D. Schuurmans,et al. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , 2022, ICLR.
[22] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[23] Peter A. Cholak,et al. Overcoming a Theoretical Limitation of Self-Attention , 2022, ACL.
[24] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.
[25] Franccois Charton. Linear algebra with transformers , 2021, Trans. Mach. Learn. Res..
[26] David Bieber,et al. Show Your Work: Scratchpads for Intermediate Computation with Language Models , 2021, ArXiv.
[27] Mohammad Bavarian,et al. Training Verifiers to Solve Math Word Problems , 2021, ArXiv.
[28] Yejin Choi,et al. Symbolic Brittleness in Sequence Models: on Systematic Generalization in Symbolic Mathematics , 2021, AAAI.
[29] Noah A. Smith,et al. Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation , 2021, ICLR.
[30] Wojciech Zaremba,et al. Evaluating Large Language Models Trained on Code , 2021, ArXiv.
[31] Noah A. Smith,et al. Saturated Transformers are Constant-Depth Threshold Circuits , 2021, Transactions of the Association for Computational Linguistics.
[32] Eran Yahav,et al. Thinking Like Transformers , 2021, ICML.
[33] Charles Blundell,et al. Neural algorithmic reasoning , 2021, Patterns.
[34] Rodrigo Nogueira,et al. Investigating the Limitations of Transformers with Simple Arithmetic Tasks , 2021, 2102.13019.
[35] Wei Zhang,et al. How Can Self-Attention Networks Recognize Dyck-n Languages? , 2020, FINDINGS.
[36] Navin Goyal,et al. On the Ability and Limitations of Transformers to Recognize Formal Languages , 2020, EMNLP.
[37] Marc van Zee,et al. Compositional Generalization in Semantic Parsing: Pre-training vs. Specialized Architectures , 2020, ArXiv.
[38] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[39] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[40] Michael Hahn,et al. Theoretical Limitations of Self-Attention in Neural Sequence Models , 2019, TACL.
[41] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[42] Lukasz Kaiser,et al. Neural GPUs Learn Algorithms , 2015, ICLR.
[43] Sanjeev Arora,et al. Computational Complexity: A Modern Approach , 2009 .
[44] Generalization , 1984 .
[45] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..
[46] James L. McClelland,et al. Representations and Computations in Transformers that Support Generalization on Structured Tasks , 2023 .
[47] P. Barceló,et al. Attention is Turing-Complete , 2021, J. Mach. Learn. Res..
[48] Sung-Hyon Myaeng,et al. Have You Seen That Number? Investigating Extrapolation in Question Answering Models , 2021, EMNLP.
[49] G. Eijk. Algorithmic reasoning , 2020 .
[50] S. Shalev-Shwartz,et al. Understanding Machine Learning - From Theory to Algorithms , 2014 .
[51] A. Shiryayev. On Tables of Random Numbers , 1993 .