Emergent Abilities of Large Language Models
暂无分享,去创建一个
J. Dean | Oriol Vinyals | Dani Yogatama | P. Liang | Colin Raffel | W. Fedus | Donald Metzler | Yi Tay | Barret Zoph | Ed Chi | Denny Zhou | Donald Metzler | Sebastian Borgeaud | Jason Wei | Jason Wei | Tatsunori Hashimoto | Rishi Bommasani | Maarten Bosma | E. Chi | O. Vinyals
[1] Quoc V. Le,et al. Transcending Scaling Laws with 0.1% Extra Compute , 2022, 2210.11399.
[2] Andrew M. Dai,et al. Scaling Instruction-Finetuned Language Models , 2022, ArXiv.
[3] Quoc V. Le,et al. Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them , 2022, ACL.
[4] Hyung Won Chung,et al. Language Models are Multilingual Chain-of-Thought Reasoners , 2022, ICLR.
[5] Mayee F. Chen,et al. Ask Me Anything: A simple strategy for prompting language models , 2022, ArXiv.
[6] Tom B. Brown,et al. In-context Learning and Induction Heads , 2022, ArXiv.
[7] Tom B. Brown,et al. Language Models (Mostly) Know What They Know , 2022, ArXiv.
[8] Gerard de Melo,et al. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.
[9] S. Gu,et al. Large Language Models are Zero-Shot Reasoners , 2022, ArXiv.
[10] D. Schuurmans,et al. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , 2022, International Conference on Learning Representations.
[11] Christopher D. Manning. Human Language Understanding & Reasoning , 2022, Daedalus.
[12] J. Dean,et al. Designing Effective Sparse Expert Models , 2022, 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[13] Oriol Vinyals,et al. Flamingo: a Visual Language Model for Few-Shot Learning , 2022, ArXiv.
[14] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.
[15] Hyung Won Chung,et al. What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? , 2022, ICML.
[16] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[17] James L. McClelland,et al. Can language models learn from explanations in context? , 2022, EMNLP.
[18] S. Levine,et al. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances , 2022, CoRL.
[19] Adrian S. Wong,et al. Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language , 2022, ICLR.
[20] Lisa Anne Hendricks,et al. Training Compute-Optimal Large Language Models , 2022, ArXiv.
[21] D. Schuurmans,et al. Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ArXiv.
[22] Carrie J. Cai,et al. PromptChainer: Chaining Large Language Model Prompts through Visual Programming , 2022, CHI Extended Abstracts.
[23] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[24] M. Lewis,et al. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? , 2022, Conference on Empirical Methods in Natural Language Processing.
[25] Tom B. Brown,et al. Predictability and Surprise in Large Generative Models , 2022, FAccT.
[26] Quantifying Memorization Across Neural Language Models , 2022, ArXiv.
[27] Matt Gardner,et al. Impact of Pretraining Term Frequencies on Few-Shot Reasoning , 2022, ArXiv.
[28] William W. Cohen,et al. Transformer Memory as a Differentiable Search Index , 2022, NeurIPS.
[29] Deduplicating Training Data Mitigates Privacy Risks in Language Models , 2022, 2202.06539.
[30] Geoffrey Irving,et al. Red Teaming Language Models with Language Models , 2022, EMNLP.
[31] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, ArXiv.
[32] Renelito Delos Santos,et al. LaMDA: Language Models for Dialog Applications , 2022, ArXiv.
[33] P. Abbeel,et al. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents , 2022, ICML.
[34] Percy Liang,et al. CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities , 2022, CHI.
[35] Quoc V. Le,et al. GLaM: Efficient Scaling of Language Models with Mixture-of-Experts , 2021, ICML.
[36] Diego de Las Casas,et al. Improving language models by retrieving from trillions of tokens , 2021, ICML.
[37] Sang Michael Xie,et al. An Explanation of In-context Learning as Implicit Bayesian Inference , 2021, ICLR.
[38] Alexander M. Rush,et al. Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.
[39] Phu Mon Htut,et al. BBQ: A hand-built bias benchmark for question answering , 2021, FINDINGS.
[40] Carrie J. Cai,et al. AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts , 2021, CHI.
[41] Owain Evans,et al. TruthfulQA: Measuring How Models Mimic Human Falsehoods , 2021, ACL.
[42] Quoc V. Le,et al. Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.
[43] Luke Zettlemoyer,et al. Noisy Channel Language Model Prompting for Few-Shot Text Classification , 2021, ACL.
[44] Nicholas Carlini,et al. Deduplicating Training Data Makes Language Models Better , 2021, ACL.
[45] Shachar Mirkin,et al. Emergent Structures and Training Dynamics in Large Language Models , 2022, BIGSCIENCE.
[46] Vinh Q. Tran,et al. Unifying Language Learning Paradigms , 2022, ArXiv.
[47] Ellie Pavlick,et al. Mapping Language Models to Grounded Conceptual Spaces , 2022, ICLR.
[48] Xi Victoria Lin,et al. Efficient Large Scale Language Modeling with Mixtures of Experts , 2021, EMNLP.
[49] Po-Sen Huang,et al. Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.
[50] Po-Sen Huang,et al. Ethical and social risks of harm from Language Models , 2021, ArXiv.
[51] Dario Amodei,et al. A General Language Assistant as a Laboratory for Alignment , 2021, ArXiv.
[52] David Bieber,et al. Show Your Work: Scratchpads for Intermediate Computation with Language Models , 2021, ArXiv.
[53] Mohammad Bavarian,et al. Training Verifiers to Solve Math Word Problems , 2021, ArXiv.
[54] Nicholas Carlini,et al. Unsolved Problems in ML Safety , 2021, ArXiv.
[55] Ellie Pavlick,et al. Frequency Effects on Syntactic Rule Learning in Transformers , 2021, EMNLP.
[56] Michael S. Bernstein,et al. On the Opportunities and Risks of Foundation Models , 2021, ArXiv.
[57] Daphne Ippolito,et al. Wordcraft: a Human-AI Collaborative Editor for Story Writing , 2021, ArXiv.
[58] Sang Michael Xie,et al. Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning , 2021, NeurIPS.
[59] Luke Zettlemoyer,et al. Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right , 2021, EMNLP.
[60] Emily M. Bender,et al. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 , 2021, FAccT.
[61] D. Klein,et al. Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.
[62] Laria Reynolds,et al. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm , 2021, CHI Extended Abstracts.
[63] Noam M. Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, J. Mach. Learn. Res..
[64] Danqi Chen,et al. Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL.
[65] Colin Raffel,et al. Extracting Training Data from Large Language Models , 2020, USENIX Security Symposium.
[66] Samuel R. Bowman,et al. When Do You Need Billions of Words of Pretraining Data? , 2020, ACL.
[67] Sanjeev Arora,et al. A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks , 2020, ICLR.
[68] Hinrich Schütze,et al. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners , 2020, NAACL.
[69] Dawn Song,et al. Measuring Massive Multitask Language Understanding , 2020, ICLR.
[70] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[71] Yejin Choi,et al. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models , 2020, FINDINGS.
[72] Omer Levy,et al. Emergent linguistic structure in artificial neural networks trained by self-supervision , 2020, Proceedings of the National Academy of Sciences.
[73] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[74] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.
[75] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[76] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[77] José Camacho-Collados,et al. WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations , 2018, NAACL.
[78] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[79] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[80] Richard Socher,et al. The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.
[81] Quoc V. Le,et al. A Simple Method for Commonsense Reasoning , 2018, ArXiv.
[82] Nathaniel J. Smith,et al. Bootstrapping language acquisition , 2017, Cognition.
[83] Chandler May,et al. Social Bias in Elicited Natural Language Inferences , 2017, EthNLP@EACL.
[84] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[85] Alex Graves,et al. Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.
[86] Alex Graves,et al. Adaptive Computation Time for Recurrent Neural Networks , 2016, ArXiv.
[87] Paul A. Lewis,et al. New Perspectives on Emergence in Economics , 2012 .
[88] H. Hwang,et al. BASIC NOTIONS , 2022 .
[89] Timothy O'Connor,et al. Emergence in Science and Philosophy , 2010 .
[90] Percy Liang,et al. Semi-Supervised Learning for Natural Language , 2005 .
[91] Scott Miller,et al. Name Tagging with Word Clusters and Discriminative Training , 2004, NAACL.
[92] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[93] James H. Martin,et al. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .
[94] Stephanie Forrest,et al. Emergent computation: self-organizing, collective, and cooperative phenomena in natural and artificial computing networks , 1990 .
[95] Tad Hogg,et al. Phase Transitions in Artificial Intelligence Systems , 1987, Artif. Intell..
[96] Philip W. Anderson,et al. More Is Different Broken symmetry and the nature of the hierarchical structure of science , 1972 .