Scaling Instruction-Finetuned Language Models
暂无分享,去创建一个
Andrew M. Dai | Hyung Won Chung | J. Dean | Xinyun Chen | S. Gu | Jacob Devlin | Xuezhi Wang | Slav Petrov | W. Fedus | Sharan Narang | Yi Tay | Barret Zoph | Yanping Huang | Ed Chi | Adam Roberts | Denny Zhou | Zhuyun Dai | Hongkun Yu | M. Dehghani | Aakanksha Chowdhery | S. Longpre | Siddhartha Brahma | Jason Wei | Mirac Suzgun | Albert Webson | Vincent Zhao | Gaurav Mishra | Le Hou | Eric Li | A. Yu | Quoc Le
[1] Quoc V. Le,et al. Transcending Scaling Laws with 0.1% Extra Compute , 2022, EMNLP.
[2] S. Gu,et al. Large Language Models Can Self-Improve , 2022, EMNLP.
[3] Quoc V. Le,et al. Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them , 2022, ACL.
[4] Hyung Won Chung,et al. Language Models are Multilingual Chain-of-Thought Reasoners , 2022, ICLR.
[5] Yuhuai Wu,et al. Solving Quantitative Reasoning Problems with Language Models , 2022, NeurIPS.
[6] J. Dean,et al. Emergent Abilities of Large Language Models , 2022, Trans. Mach. Learn. Res..
[7] Gerard de Melo,et al. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.
[8] S. Gu,et al. Large Language Models are Zero-Shot Reasoners , 2022, ArXiv.
[9] Arun Tejasvi Chaganty,et al. Dialog Inpainting: Turning Documents into Dialogs , 2022, ICML.
[10] G. Karypis,et al. Exploring the Role of Task Transferability in Large-Scale Multi-Task Learning , 2022, NAACL.
[11] Hyung Won Chung,et al. What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? , 2022, ICML.
[12] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[13] Andrew Zaldivar,et al. Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI , 2022, FAccT.
[14] Marc van Zee,et al. Scaling Up Models and Data with t5x and seqio , 2022, J. Mach. Learn. Res..
[15] Lisa Anne Hendricks,et al. Training Compute-Optimal Large Language Models , 2022, ArXiv.
[16] Noah D. Goodman,et al. STaR: Bootstrapping Reasoning With Reasoning , 2022, 2203.14465.
[17] D. Schuurmans,et al. Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ArXiv.
[18] Swaroop Mishra,et al. How Many Data Samples is an Additional Instruction Worth? , 2022, FINDINGS.
[19] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[20] Cherepanov,et al. Competition-level code generation with AlphaCode , 2022, Science.
[21] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, ArXiv.
[22] Quoc V. Le,et al. GLaM: Efficient Scaling of Language Models with Mixture-of-Experts , 2021, ICML.
[23] M. Lewis,et al. MetaICL: Learning to Learn In Context , 2021, NAACL.
[24] Alexander M. Rush,et al. Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.
[25] Quoc V. Le,et al. Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.
[26] Rami Al-Rfou,et al. ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models , 2021, Transactions of the Association for Computational Linguistics.
[27] Noah A. Smith,et al. Benchmarking Generalization via In-Context Instructions on 1, 600+ Language Tasks , 2022, ArXiv.
[28] Vinh Q. Tran,et al. Unifying Language Learning Paradigms , 2022, ArXiv.
[29] S. Muresan,et al. Continual-T0: Progressively Instructing 50+ Tasks to Language Models Without Forgetting , 2022, ArXiv.
[30] Po-Sen Huang,et al. Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.
[31] David Bieber,et al. Show Your Work: Scratchpads for Intermediate Computation with Language Models , 2021, ArXiv.
[32] Mohammad Bavarian,et al. Training Verifiers to Solve Math Word Problems , 2021, ArXiv.
[33] Eunsol Choi,et al. CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge , 2021, NeurIPS Datasets and Benchmarks.
[34] Michael S. Bernstein,et al. On the Opportunities and Risks of Foundation Models , 2021, ArXiv.
[35] Wojciech Zaremba,et al. Evaluating Large Language Models Trained on Code , 2021, ArXiv.
[36] Xiang Ren,et al. CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP , 2021, EMNLP.
[37] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.
[38] Dan Klein,et al. Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections , 2021, EMNLP.
[39] Sonal Gupta,et al. Muppet: Massive Multi-task Representations with Pre-Finetuning , 2021, EMNLP.
[40] Jonathan Berant,et al. Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies , 2021, Transactions of the Association for Computational Linguistics.
[41] Zhucheng Tu,et al. Open-Domain Question Answering Goes Conversational via Question Rewriting , 2020, NAACL.
[42] Eunsol Choi,et al. QED: A Framework and Dataset for Explanations in Question Answering , 2020, Transactions of the Association for Computational Linguistics.
[43] Dawn Song,et al. Measuring Massive Multitask Language Understanding , 2020, ICLR.
[44] Quoc V. Le,et al. Searching for Efficient Transformers for Language Modeling , 2021, NeurIPS.
[45] Dinesh Garg,et al. Explanations for CommonsenseQA: New Dataset and Models , 2021, ACL.
[46] David Patterson,et al. A domain-specific supercomputer for training deep neural networks , 2020, Commun. ACM.
[47] Jonathan Berant,et al. Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge , 2020, ArXiv.
[48] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[49] Percy Liang,et al. Graph-based, Self-Supervised Program Repair from Diagnostic Feedback , 2020, ICML.
[50] Hannaneh Hajishirzi,et al. UnifiedQA: Crossing Format Boundaries With a Single QA System , 2020, FINDINGS.
[51] Eunsol Choi,et al. TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages , 2020, Transactions of the Association for Computational Linguistics.
[52] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[53] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[54] Ashish Sabharwal,et al. QASC: A Dataset for Question Answering via Sentence Composition , 2019, AAAI.
[55] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[56] Thomas Lukasiewicz,et al. Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations , 2019, ACL.
[57] Bill Byrne,et al. Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset , 2019, EMNLP.
[58] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[59] Richard Socher,et al. Explain Yourself! Leveraging Language Models for Commonsense Reasoning , 2019, ACL.
[60] Yue Zhang,et al. Does it Make Sense? And Why? A Pilot Study for Sense Making and Explanation , 2019, ACL.
[61] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[62] Inioluwa Deborah Raji,et al. Model Cards for Model Reporting , 2018, FAT.
[63] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[64] Thomas Lukasiewicz,et al. e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.
[65] Rachel Rudinger,et al. Gender Bias in Coreference Resolution , 2018, NAACL.
[66] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[67] Wang Ling,et al. Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems , 2017, ACL.
[68] Philipp Koehn,et al. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016 .