Scaling Instruction-Finetuned Language Models
暂无分享,去创建一个
Andrew M. Dai | Hyung Won Chung | J. Dean | Yanping Huang | Xinyun Chen | S. Gu | Jacob Devlin | Xuezhi Wang | Slav Petrov | W. Fedus | Sharan Narang | Yi Tay | Barret Zoph | Yanping Huang | Ed Chi | Adam Roberts | Denny Zhou | Zhuyun Dai | Hongkun Yu | M. Dehghani | Aakanksha Chowdhery | Jason Wei | S. Longpre | Siddhartha Brahma | Jason Wei | Mirac Suzgun | Albert Webson | Vincent Zhao | Gaurav Mishra | Le Hou | Eric Li | Xinyun Chen | A. Yu | Denny Zhou | Quoc Le | E. Chi | Dasha Valter | Quoc V. Le | Mostafa Dehghani
[1] Quoc V. Le,et al. Transcending Scaling Laws with 0.1% Extra Compute , 2022, 2210.11399.
[2] S. Gu,et al. Large Language Models Can Self-Improve , 2022, 2210.11610.
[3] Quoc V. Le,et al. Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them , 2022, ACL.
[4] Hyung Won Chung,et al. Language Models are Multilingual Chain-of-Thought Reasoners , 2022, ICLR.
[5] Yuhuai Wu,et al. Solving Quantitative Reasoning Problems with Language Models , 2022, NeurIPS.
[6] J. Dean,et al. Emergent Abilities of Large Language Models , 2022, ArXiv.
[7] Gerard de Melo,et al. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.
[8] S. Gu,et al. Large Language Models are Zero-Shot Reasoners , 2022, ArXiv.
[9] Arun Tejasvi Chaganty,et al. Dialog Inpainting: Turning Documents into Dialogs , 2022, ICML.
[10] G. Karypis,et al. Exploring the Role of Task Transferability in Large-Scale Multi-Task Learning , 2022, NAACL.
[11] Kuntal Kumar Pal,et al. Benchmarking Generalization via In-Context Instructions on 1, 600+ Language Tasks , 2022, ArXiv.
[12] Hyung Won Chung,et al. What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? , 2022, ICML.
[13] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[14] Andrew Zaldivar,et al. Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI , 2022, FAccT.
[15] Marc van Zee,et al. Scaling Up Models and Data with t5x and seqio , 2022, J. Mach. Learn. Res..
[16] Lisa Anne Hendricks,et al. Training Compute-Optimal Large Language Models , 2022, ArXiv.
[17] Noah D. Goodman,et al. STaR: Bootstrapping Reasoning With Reasoning , 2022, 2203.14465.
[18] D. Schuurmans,et al. Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ArXiv.
[19] Swaroop Mishra,et al. How Many Data Samples is an Additional Instruction Worth? , 2022, FINDINGS.
[20] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[21] Cherepanov,et al. Competition-level code generation with AlphaCode , 2022, Science.
[22] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, ArXiv.
[23] Quoc V. Le,et al. GLaM: Efficient Scaling of Language Models with Mixture-of-Experts , 2021, ICML.
[24] M. Lewis,et al. MetaICL: Learning to Learn In Context , 2021, NAACL.
[25] Alexander M. Rush,et al. Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.
[26] Quoc V. Le,et al. Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.
[27] Rami Al-Rfou,et al. ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models , 2021, Transactions of the Association for Computational Linguistics.
[28] Vinh Q. Tran,et al. Unifying Language Learning Paradigms , 2022, ArXiv.
[29] S. Muresan,et al. Continual-T0: Progressively Instructing 50+ Tasks to Language Models Without Forgetting , 2022, ArXiv.
[30] Po-Sen Huang,et al. Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.
[31] David Bieber,et al. Show Your Work: Scratchpads for Intermediate Computation with Language Models , 2021, ArXiv.
[32] Mohammad Bavarian,et al. Training Verifiers to Solve Math Word Problems , 2021, ArXiv.
[33] Eunsol Choi,et al. CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge , 2021, NeurIPS Datasets and Benchmarks.
[34] Michael S. Bernstein,et al. On the Opportunities and Risks of Foundation Models , 2021, ArXiv.
[35] Wojciech Zaremba,et al. Evaluating Large Language Models Trained on Code , 2021, ArXiv.
[36] Xiang Ren,et al. CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP , 2021, EMNLP.
[37] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.
[38] Dan Klein,et al. Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections , 2021, EMNLP.
[39] Sonal Gupta,et al. Muppet: Massive Multi-task Representations with Pre-Finetuning , 2021, EMNLP.
[40] Jonathan Berant,et al. Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies , 2021, Transactions of the Association for Computational Linguistics.
[41] Zhucheng Tu,et al. Open-Domain Question Answering Goes Conversational via Question Rewriting , 2020, NAACL.
[42] Eunsol Choi,et al. QED: A Framework and Dataset for Explanations in Question Answering , 2020, Transactions of the Association for Computational Linguistics.
[43] Dawn Song,et al. Measuring Massive Multitask Language Understanding , 2020, ICLR.
[44] Quoc V. Le,et al. Searching for Efficient Transformers for Language Modeling , 2021, NeurIPS.
[45] Dinesh Garg,et al. Explanations for CommonsenseQA: New Dataset and Models , 2021, ACL.
[46] David Patterson,et al. A domain-specific supercomputer for training deep neural networks , 2020, Commun. ACM.
[47] Jonathan Berant,et al. Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge , 2020, ArXiv.
[48] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[49] Percy Liang,et al. Graph-based, Self-Supervised Program Repair from Diagnostic Feedback , 2020, ICML.
[50] Hannaneh Hajishirzi,et al. UnifiedQA: Crossing Format Boundaries With a Single QA System , 2020, FINDINGS.
[51] Eunsol Choi,et al. TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages , 2020, Transactions of the Association for Computational Linguistics.
[52] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[53] Ashish Sabharwal,et al. QASC: A Dataset for Question Answering via Sentence Composition , 2019, AAAI.
[54] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[55] Thomas Lukasiewicz,et al. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations , 2019, ACL.
[56] Bill Byrne,et al. Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset , 2019, EMNLP.
[57] Richard Socher,et al. Explain Yourself! Leveraging Language Models for Commonsense Reasoning , 2019, ACL.
[58] Yue Zhang,et al. Does it Make Sense? And Why? A Pilot Study for Sense Making and Explanation , 2019, ACL.
[59] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[60] Inioluwa Deborah Raji,et al. Model Cards for Model Reporting , 2018, FAT.
[61] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[62] Thomas Lukasiewicz,et al. e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.
[63] Rachel Rudinger,et al. Gender Bias in Coreference Resolution , 2018, NAACL.
[64] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[65] Wang Ling,et al. Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems , 2017, ACL.