Improving Language Models with Advantage-based Offline Policy Gradients
暂无分享,去创建一个
[1] Peter J. Liu,et al. SLiC-HF: Sequence Likelihood Calibration with Human Feedback , 2023, ArXiv.
[2] Kang Min Yoo,et al. Critic-Guided Decoding for Controlled Text Generation , 2022, ACL.
[3] Yejin Choi,et al. Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts , 2022, ACL.
[4] Barbara Plank. The “Problem” of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation , 2022, EMNLP.
[5] Yejin Choi,et al. Generating Sequences by Learning to Self-Correct , 2022, ICLR.
[6] Yejin Choi,et al. Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization , 2022, ArXiv.
[7] Jesse Dodge,et al. Measuring the Carbon Intensity of AI in Cloud Instances , 2022, FAccT.
[8] S. Levine,et al. Offline RL for Natural Language Generation with Implicit Language Q Learning , 2022, ICLR.
[9] Yejin Choi,et al. Quark: Controllable Text Generation with Reinforced Unlearning , 2022, NeurIPS.
[10] Siva Reddy,et al. FaithDial: A Faithful Benchmark for Information-Seeking Dialogue , 2022, Transactions of the Association for Computational Linguistics.
[11] Mo Yu,et al. On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models? , 2022, NAACL.
[12] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[13] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.
[14] Ronan Le Bras,et al. Symbolic Knowledge Distillation: from General Language Models to Commonsense Models , 2021, NAACL.
[15] Quoc V. Le,et al. Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.
[16] Alan Ritter,et al. Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts , 2021, EMNLP.
[17] Yejin Choi,et al. DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts , 2021, ACL.
[18] Yejin Choi,et al. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models , 2020, FINDINGS.
[19] Yejin Choi,et al. Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics , 2020, EMNLP.
[20] Xiang Gao,et al. Dialogue Response Ranking Training with Large-Scale Human Feedback Data , 2020, EMNLP.
[21] Nando de Freitas,et al. Critic Regularized Regression , 2020, NeurIPS.
[22] S. Levine,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.
[23] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[24] John X. Morris,et al. TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP , 2020, EMNLP.
[25] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.
[26] Ronan Le Bras,et al. Adversarial Filters of Dataset Biases , 2020, ICML.
[27] Hanna M. Wallach,et al. Measurement and Fairness , 2019, FAccT.
[28] Jianfeng Gao,et al. DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation , 2019, ACL.
[29] Lav R. Varshney,et al. CTRL: A Conditional Transformer Language Model for Controllable Generation , 2019, ArXiv.
[30] Yejin Choi,et al. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction , 2019, ACL.
[31] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.
[32] Jason Weston,et al. What makes a good conversation? How controllable attributes affect human judgments , 2019, NAACL.
[33] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[34] J. Weston,et al. Wizard of Wikipedia: Knowledge-Powered Conversational agents , 2018, ICLR.
[35] Alan Ritter,et al. Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints , 2018, EMNLP.
[36] Samuel R. Bowman,et al. Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.
[37] Mohit Bansal,et al. Polite Dialogue Generation Without Parallel Data , 2018, TACL.
[38] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[39] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[40] Karin M. Verspoor,et al. Findings of the 2016 Conference on Machine Translation , 2016, WMT.
[41] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[42] Olivier Buffet,et al. Policy‐Gradient Algorithms , 2013 .
[43] Christopher Potts,et al. Did It Happen? The Pragmatic Complexity of Veridicality Assessment , 2012, CL.
[44] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[45] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.
[46] Mark B. Ring. CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.
[47] Dmitrii Krasheninnikov,et al. Defining and Characterizing Reward Gaming , 2022, NeurIPS.
[48] S. Levine,et al. Should I Run Offline Reinforcement Learning or Behavioral Cloning? , 2022, ICLR.
[49] Kee-Eung Kim,et al. GPT-Critic: Offline Reinforcement Learning for End-to-End Task-Oriented Dialogue Systems , 2022, ICLR.
[50] Yejin Choi,et al. Understanding Dataset Difficulty with V-Usable Information , 2021, ICML.
[51] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[52] Danqi Chen,et al. of the Association for Computational Linguistics: , 2001 .