论文信息 - Emergent Abilities of Large Language Models

Emergent Abilities of Large Language Models

Scaling up language models has been shown to predictably improve performance and sample efﬁciency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The exis-tence of such emergence implies that additional scaling could further expand the range of capabilities of language models.

[1] Quoc V. Le,et al. Transcending Scaling Laws with 0.1% Extra Compute , 2022, 2210.11399.

[2] Andrew M. Dai,et al. Scaling Instruction-Finetuned Language Models , 2022, ArXiv.

[3] Quoc V. Le,et al. Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them , 2022, ACL.

[4] Hyung Won Chung,et al. Language Models are Multilingual Chain-of-Thought Reasoners , 2022, ICLR.

[5] Mayee F. Chen,et al. Ask Me Anything: A simple strategy for prompting language models , 2022, ArXiv.

[6] Tom B. Brown,et al. In-context Learning and Induction Heads , 2022, ArXiv.

[7] Tom B. Brown,et al. Language Models (Mostly) Know What They Know , 2022, ArXiv.

[8] Gerard de Melo,et al. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.

[9] S. Gu,et al. Large Language Models are Zero-Shot Reasoners , 2022, ArXiv.

[10] D. Schuurmans,et al. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , 2022, International Conference on Learning Representations.

[11] Christopher D. Manning. Human Language Understanding & Reasoning , 2022, Daedalus.

[12] J. Dean,et al. Designing Effective Sparse Expert Models , 2022, 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[13] Oriol Vinyals,et al. Flamingo: a Visual Language Model for Few-Shot Learning , 2022, ArXiv.

[14] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[15] Hyung Won Chung,et al. What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? , 2022, ICML.

[16] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[17] James L. McClelland,et al. Can language models learn from explanations in context? , 2022, EMNLP.

[18] S. Levine,et al. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances , 2022, CoRL.

[19] Adrian S. Wong,et al. Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language , 2022, ICLR.

[20] Lisa Anne Hendricks,et al. Training Compute-Optimal Large Language Models , 2022, ArXiv.

[21] D. Schuurmans,et al. Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ArXiv.

[22] Carrie J. Cai,et al. PromptChainer: Chaining Large Language Model Prompts through Visual Programming , 2022, CHI Extended Abstracts.

[23] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.

[24] M. Lewis,et al. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? , 2022, Conference on Empirical Methods in Natural Language Processing.