Scaling Laws and Interpretability of Learning from Repeated Data
暂无分享,去创建一个
Tom B. Brown | Dario Amodei | T. Henighan | Benjamin Mann | Sam McCandlish | C. Olah | Catherine Olsson | S. El-Showk | Dawn Drain | Danny Hernandez | Tom Conerly | Nova DasSarma | Nelson Elhage | Zac Hatfield-Dodds | Tristan Hume | Scott Johnston | Nicholas Joseph | Jared Kaplan | Nova Dassarma
[1] Tom B. Brown,et al. In-context Learning and Induction Heads , 2022, ArXiv.
[2] Lisa Anne Hendricks,et al. Training Compute-Optimal Large Language Models , 2022, ArXiv.
[3] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[4] Tom B. Brown,et al. Predictability and Surprise in Large Generative Models , 2022, FAccT.
[5] Nicholas Carlini,et al. Deduplicating Training Data Makes Language Models Better , 2021, ACL.
[6] Po-Sen Huang,et al. Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.
[7] Dario Amodei,et al. A General Language Assistant as a Laboratory for Alignment , 2021, ArXiv.
[8] Ethan Caballero,et al. Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers , 2021, ArXiv.
[9] Jasha Droppo,et al. Scaling Laws for Acoustic Models , 2021, Interspeech.
[10] Alec Radford,et al. Multimodal Neurons in Artificial Neural Networks , 2021 .
[11] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[12] Tom Henighan,et al. Scaling Laws for Transfer , 2021, ArXiv.
[13] Charles Foster,et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.
[14] Mark Chen,et al. Scaling Laws for Autoregressive Generative Modeling , 2020, ArXiv.
[15] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[16] Tom B. Brown,et al. Measuring the Algorithmic Efficiency of Neural Networks , 2020, ArXiv.
[17] Chenliang Li,et al. PALM: Pre-training an Autoencoding&autoregressive Language Model for Context-conditioned Generation , 2020, EMNLP.
[18] Nick Cammarata,et al. Thread: Circuits , 2020, Distill.
[19] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[20] Boaz Barak,et al. Deep double descent: where bigger models and more data hurt , 2019, ICLR.
[21] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.
[22] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[23] Levent Sagun,et al. The jamming transition as a paradigm to understand the loss landscape of deep neural networks , 2018, Physical review. E.
[24] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[25] Yang Yang,et al. Deep Learning Scaling is Predictable, Empirically , 2017, ArXiv.
[26] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[27] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.
[28] M. Opper. Statistical Mechanics of Learning : Generalization , 2002 .
[29] Manfred Opper,et al. A Variational Approach to Learning Curves , 2001, NIPS.