暂无分享,去创建一个
Alec Radford | Jeff Wu | Dario Amodei | Daniel M. Ziegler | Ryan Lowe | Paul Christiano | Nisan Stiennon | Long Ouyang | Chelsea Voss
[1] Richard E. Turner,et al. Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control , 2016, ICML.
[2] Iryna Gurevych,et al. Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation , 2019, IJCAI.
[3] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[4] Douglas Eck,et al. Tuning Recurrent Neural Networks with Reinforcement Learning , 2016, ICLR.
[5] Yao Zhao,et al. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2020, ICML.
[6] Richard Socher,et al. A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.
[7] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.
[8] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[9] Paul Covington,et al. Deep Neural Networks for YouTube Recommendations , 2016, RecSys.
[10] Oleg O. Sushkov,et al. Scaling data-driven robotics with reward sketching and batch reinforcement learning , 2019, Robotics: Science and Systems.
[11] Yang Fang,et al. Abstract Text Summarization with a Convolutional Seq2seq Model , 2019, Applied Sciences.
[12] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.
[13] Richard Socher,et al. Neural Text Summarization: A Critical Evaluation , 2019, EMNLP.
[14] Garrison W. Cottrell,et al. Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.
[15] Thorsten Joachims,et al. Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.
[16] Richard Socher,et al. The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.
[17] Joelle Pineau,et al. An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.
[18] Jianfeng Gao,et al. Towards Coherent and Cohesive Long-form Text Generation , 2018, Proceedings of the First Workshop on Narrative Understanding.
[19] Shashi Narayan,et al. Leveraging Pre-trained Checkpoints for Sequence Generation Tasks , 2019, Transactions of the Association for Computational Linguistics.
[20] Jason Weston,et al. Neural Text Generation with Unlikelihood Training , 2019, ICLR.
[21] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[22] Thorsten Joachims,et al. Optimizing search engines using clickthrough data , 2002, KDD.
[23] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[24] Ali Farhadi,et al. Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping , 2020, ArXiv.
[25] Norbert Fuhr,et al. Optimum polynomial retrieval functions based on the probability ranking principle , 1989, TOIS.
[26] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[27] Mark O. Riedl,et al. Controllable Neural Story Plot Generation via Reward Shaping , 2018, IJCAI.
[28] Yuxiang Wu,et al. Learning to Extract Coherent Summary via Deep Reinforcement Learning , 2018, AAAI.
[29] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.
[30] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.
[31] Alexander M. Rush,et al. Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.
[32] Mohit Bansal,et al. Polite Dialogue Generation Without Parallel Data , 2018, TACL.
[33] Richard M. Schwartz,et al. Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation , 2003, HLT-NAACL 2003.
[34] Jiusheng Chen,et al. ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training , 2020, EMNLP.
[35] Shane Legg,et al. Reward learning from human preferences and demonstrations in Atari , 2018, NeurIPS.
[36] Ke Xu,et al. Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models , 2020, AAAI.
[37] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.
[38] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[39] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[40] Tie-Yan Liu,et al. Learning to rank for information retrieval , 2009, SIGIR.
[41] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.
[42] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.
[43] Jason Weston,et al. ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons , 2019, ArXiv.
[44] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[45] Natalie Schluter,et al. The limits of automatic summarisation according to ROUGE , 2017, EACL.
[46] R. Likert. “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.
[47] Ryan McDonald,et al. On Faithfulness and Factuality in Abstractive Summarization , 2020, ACL.
[48] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[49] Mark O. Riedl,et al. Controllable Neural Story Generation via Reinforcement Learning , 2018, ArXiv.
[50] Ido Dagan,et al. Better Rewards Yield Better Summaries: Learning to Summarise Without References , 2019, EMNLP.
[51] Dilek Z. Hakkani-Tür,et al. Towards Coherent and Engaging Spoken Dialog Response Generation Using Automatic Conversation Evaluators , 2019, INLG.
[52] Jackie Chi Kit Cheung,et al. BanditSum: Extractive Summarization as a Contextual Bandit , 2018, EMNLP.
[53] Chin-Yew Lin,et al. Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.
[54] Hal Daumé,et al. Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback , 2017, EMNLP.
[55] Florian Schmidt. Generalization in Generation: A closer look at Exposure Bias , 2019, NGT@EMNLP-IJCNLP.
[56] Shahram Khadivi,et al. Can Neural Machine Translation be Improved with User Feedback? , 2018, NAACL.
[57] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[58] Stefan Riezler,et al. Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback , 2018, ACL.
[59] Daphne Ippolito,et al. Trading Off Diversity and Quality in Natural Language Generation , 2020, HUMEVAL.
[60] Sanja Fidler,et al. Teaching Machines to Describe Images via Natural Language Feedback , 2017, ArXiv.
[61] Dario Amodei,et al. Supervising strong learners by amplifying weak experts , 2018, ArXiv.
[62] Benno Stein,et al. TL;DR: Mining Reddit to Learn Automatic Summarization , 2017, NFiS@EMNLP.
[63] Jason Weston,et al. A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.
[64] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.
[65] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[66] Percy Liang,et al. The price of debiasing automatic metrics in natural language evalaution , 2018, ACL.
[67] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[68] Anca D. Dragan,et al. Reward-rational (implicit) choice: A unifying formalism for reward learning , 2020, NeurIPS.
[69] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[70] Jason Weston,et al. Learning from Dialogue after Deployment: Feed Yourself, Chatbot! , 2019, ACL.
[71] Shane Legg,et al. Scalable agent alignment via reward modeling: a research direction , 2018, ArXiv.
[72] Jason Weston,et al. Finding Generalizable Evidence by Learning to Convince Q&A Models , 2019, EMNLP.
[73] Alec Radford,et al. Fine-Tuning Language Models from Human Preferences , 2019, ArXiv.