Exploring Length Generalization in Large Language Models
暂无分享,去创建一个
Yuhuai Wu | Behnam Neyshabur | Cem Anil | Aitor Lewkowycz | Ethan Dyer | Guy Gur-Ari | V. Ramasesh | Vedant Misra | Ambrose Slone | Anders Andreassen | Guy Gur-Ari
[1] Yuhuai Wu,et al. Solving Quantitative Reasoning Problems with Language Models , 2022, NeurIPS.
[2] Sébastien Bubeck,et al. Unveiling Transformers with LEGO: a synthetic reasoning task , 2022, ArXiv.
[3] Matt Gardner,et al. Impact of Pretraining Term Frequencies on Few-Shot Reasoning , 2022, ArXiv.
[4] Furong Huang,et al. End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking , 2022, ArXiv.
[5] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, ArXiv.
[6] Renelito Delos Santos,et al. LaMDA: Language Models for Dialog Applications , 2022, ArXiv.
[7] Noah A. Smith,et al. Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation , 2021, ICLR.
[8] David Bieber,et al. Show Your Work: Scratchpads for Intermediate Computation with Language Models , 2021, ArXiv.
[9] Charles Sutton,et al. Program Synthesis with Large Language Models , 2021, ArXiv.
[10] Wojciech Zaremba,et al. Evaluating Large Language Models Trained on Code , 2021, ArXiv.
[11] Uzi Vishkin,et al. Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks , 2021, NeurIPS.
[12] Jason Weston,et al. Staircase Attention for Recurrent Processing of Sequences , 2021, NeurIPS.
[13] Dawn Song,et al. Measuring Mathematical Problem Solving With the MATH Dataset , 2021, NeurIPS Datasets and Benchmarks.
[14] Behnam Neyshabur,et al. Understanding the Failure Modes of Out-of-Distribution Generalization , 2020, ICLR.
[15] Eli A. Meirom,et al. From Local Structures to Size Generalization in Graph Neural Networks , 2020, ICML.
[16] Jimmy Ba,et al. INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving , 2020, ICLR.
[17] E. Kharitonov,et al. What they do when in doubt: a study of inductive biases in seq2seq learners , 2020, ICLR.
[18] Benjamin Newman,et al. The EOS Decision and Length Extrapolation , 2020, BLACKBOXNLP.
[19] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[20] Oyvind Tafjord,et al. Transformers as Soft Reasoners over Language , 2020, IJCAI.
[21] R. Thomas McCoy,et al. Does Syntax Need to Grow on Trees? Sources of Hierarchical Inductive Bias in Sequence-to-Sequence Networks , 2020, TACL.
[22] Elia Bruni,et al. Location Attention for Extrapolation to Longer Sequences , 2019, ACL.
[23] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[24] Rajarshi Das,et al. Do Multi-hop Readers Dream of Reasoning Chains? , 2019, EMNLP.
[25] Haohan Wang,et al. Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual , 2019, EMNLP.
[26] Yonatan Belinkov,et al. LSTM Networks Can Perform Dynamic Counting , 2019, Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges.
[27] R. Thomas McCoy,et al. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.
[28] Yee Whye Teh,et al. Set Transformer , 2018, ICML.
[29] David Chiang,et al. Correcting Length Bias in Neural Machine Translation , 2018, WMT.
[30] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[31] Lukasz Kaiser,et al. Neural GPUs Learn Algorithms , 2015, ICLR.
[32] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..