Reinforced Self-Training (ReST) for Language Modeling
暂无分享,去创建一个
Nando de Freitas | T. Paine | Caglar Gulcehre | A. Doucet | N. D. Freitas | Orhan Firat | Wolfgang Macherey | Ksenia Konyushkova | Chenjie Gu | Aditya Siddhant | L. Weerts | Miaosen Wang | Abhishek Sharma | S. Srinivasan | Alexa Ahern
[1] Christopher D. Manning,et al. Direct Preference Optimization: Your Language Model is Secretly a Reward Model , 2023, NeurIPS.
[2] Nicolas Papernot,et al. The Curse of Recursion: Training on Generated Data Makes Models Forget , 2023, ArXiv.
[3] Yejin Choi,et al. Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing , 2023, ArXiv.
[4] T. Zhang,et al. RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment , 2023, ArXiv.
[5] Marco Tulio Ribeiro,et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4 , 2023, ArXiv.
[6] Geoffrey Irving,et al. Solving math word problems with process- and outcome-based feedback , 2022, ArXiv.
[7] J. Schulman,et al. Scaling Laws for Reward Model Overoptimization , 2022, ICML.
[8] Lisa Anne Hendricks,et al. Improving alignment of dialogue agents via targeted human judgements , 2022, ArXiv.
[9] Dmitrii Krasheninnikov,et al. Defining and Characterizing Reward Hacking , 2022, ArXiv.
[10] Chris Dyer,et al. MAD for Robust Reinforcement Learning in Machine Translation , 2022, ArXiv.
[11] Gerard de Melo,et al. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.
[12] R. Laroche,et al. When does return-conditioned supervised learning work for offline reinforcement learning? , 2022, NeurIPS.
[13] Tom B. Brown,et al. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback , 2022, ArXiv.
[14] Noah D. Goodman,et al. STaR: Bootstrapping Reasoning With Reasoning , 2022, NeurIPS.
[15] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[16] Cherepanov,et al. Competition-level code generation with AlphaCode , 2022, Science.
[17] Geoffrey Irving,et al. Red Teaming Language Models with Language Models , 2022, EMNLP.
[18] Po-Sen Huang,et al. Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.
[19] Jan Leike,et al. Recursively Summarizing Books with Human Feedback , 2021, ArXiv.
[20] Markus Freitag,et al. Scaling Laws for Neural Machine Translation , 2021, ICLR.
[21] Oriol Vinyals,et al. Highly accurate protein structure prediction with AlphaFold , 2021, Nature.
[22] Siqi Liu,et al. Launchpad: A Programming Model for Distributed Machine Learning Research , 2021, ArXiv.
[23] Pieter Abbeel,et al. Decision Transformer: Reinforcement Learning via Sequence Modeling , 2021, NeurIPS.
[24] Oriol Vinyals,et al. Machine Translation Decoding beyond Beam Search , 2021, EMNLP.
[25] Razvan Pascanu,et al. Regularized Behavior Value Estimation , 2021, ArXiv.
[26] Olivier Pietquin,et al. Supervised Seeded Iterated Learning for Interactive Language Learning , 2020, EMNLP.
[27] Alon Lavie,et al. COMET: A Neural Framework for MT Evaluation , 2020, EMNLP.
[28] Ryan J. Lowe,et al. Learning to summarize from human feedback , 2020, NeurIPS 2020.
[29] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[30] Justin Fu,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.
[31] M. Utiyama,et al. Self-Training for Unsupervised Neural Machine Translation in Unbalanced Training Data Scenarios , 2020, NAACL.
[32] Thibault Sellam,et al. BLEURT: Learning Robust Metrics for Text Generation , 2020, ACL.
[33] Aaron C. Courville,et al. Countering Language Drift with Seeded Iterated Learning , 2020, ICML.
[34] Quoc V. Le,et al. Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Marc'Aurelio Ranzato,et al. Revisiting Self-Training for Neural Sequence Generation , 2019, ICLR.
[36] H. Francis Song,et al. V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control , 2019, ICLR.
[37] Satinder Singh,et al. Self-Imitation Learning , 2018, ICML.
[38] Taku Kudo,et al. Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.
[39] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[40] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[41] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[42] Jiajun Zhang,et al. Exploiting Source-side Monolingual Data in Neural Machine Translation , 2016, EMNLP.
[43] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[44] S. Kirby,et al. Iterated learning and the evolution of language , 2014, Current Opinion in Neurobiology.
[45] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[46] James R. Curran,et al. Bootstrapping POS-taggers using unlabelled data , 2003, CoNLL.
[47] David Yarowsky,et al. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.
[48] H. J. Scudder,et al. Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.
[49] R. Bellman. A Markovian Decision Process , 1957 .
[50] George F. Foster,et al. Results of WMT22 Metrics Shared Task: Stop Using BLEU – Neural Metrics Are Better and More Robust , 2022, WMT.
[51] S. Levine,et al. Should I Run Offline Reinforcement Learning or Behavioral Cloning? , 2022, ICLR.
[52] Marc G. Bellemare,et al. Beyond Tabula Rasa: Reincarnating Reinforcement Learning , 2022, ArXiv.
[53] Lisa Anne Hendricks,et al. An empirical analysis of compute-optimal large language model training , 2022, NeurIPS.
[54] Sandy H. Huang,et al. On Multi-objective Policy Optimization as a Tool for Reinforcement Learning , 2021, ArXiv.
[55] A. Lavie,et al. Results of the WMT21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain , 2021, WMT.
[56] He He,et al. Text Generation by Learning from Demonstrations , 2020, ArXiv.
[57] Srivatsan Srinivasan,et al. The DeepMind Chinese–English Document Translation System at WMT2020 , 2020, WMT.
[58] Philipp Koehn,et al. Findings of the WMT 2020 Shared Task on Parallel Corpus Filtering and Alignment , 2020, WMT.
[59] Marcello Federico,et al. Report on the 11th IWSLT evaluation campaign , 2014, IWSLT.
[60] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[61] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .