暂无分享,去创建一个
Natasha Jaques | Rosalind W. Picard | Shixiang Gu | Craig Ferguson | Judy Hanwen Shen | Rosalind Picard | Asma Ghandeharioun | Agata Lapedriza | Noah Jones | S. Gu | À. Lapedriza | Natasha Jaques | Asma Ghandeharioun | Craig Ferguson | Noah J. Jones | J. Shen | Àgata Lapedriza
[1] M. K rn,et al. Stochastic Optimal Control , 1988 .
[2] Leslie Pack Kaelbling,et al. Learning to Achieve Goals , 1993, IJCAI.
[3] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[4] Jennifer Hay. Functions of humor in the conversations of men and women , 2000 .
[5] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[6] Candace L. Sidner,et al. Where to look: a study of human-robot engagement , 2004, IUI '04.
[7] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[8] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.
[9] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[10] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[11] Harry Weger,et al. Active Listening in Peer Interviews: The Influence of Message Paraphrasing on Perceptions of Listening Skill , 2010 .
[12] Lauren E. Scissors,et al. Language Style Matching Predicts Relationship Initiation and Stability , 2011, Psychological science.
[13] Cristian Danescu-Niculescu-Mizil,et al. Chameleons in Imagined Conversations: A New Approach to Understanding Coordination of Linguistic Style in Dialogs , 2011, CMCL@ACL.
[14] Milica Gasic,et al. On-line policy optimisation of spoken dialogue systems via live interaction with human subjects , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[15] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[16] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.
[17] G. Bodie,et al. Listening Competence in Initial Interactions I: Distinguishing Between What Listening Is and What Listeners Do , 2012 .
[18] Vicenç Gómez,et al. Optimal control as a graphical model inference problem , 2009, Machine Learning.
[19] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[20] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[21] Marco Pavone,et al. Stochastic Optimal Control , 2015 .
[22] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[23] Andrea J. Vickery,et al. The Role of “Active Listening” in Informal Helping Conversations: Impact on Perceptions of Listener Helpfulness, Sensitivity, and Supportiveness and Discloser Emotional Improvement , 2015 .
[24] Jianfeng Gao,et al. Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.
[25] Joelle Pineau,et al. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.
[26] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[27] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[28] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[29] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[30] Jing He,et al. Policy Networks with Two-Stage Training for Dialogue Systems , 2016, SIGDIAL Conference.
[31] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[32] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[33] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[34] Alan Ritter,et al. Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.
[35] Iyad Rahwan,et al. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm , 2017, EMNLP.
[36] Jason Weston,et al. Dialogue Learning With Human-In-The-Loop , 2016, ICLR.
[37] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[38] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[39] Stefan Ultes,et al. Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management , 2017, SIGDIAL Conference.
[40] Joelle Pineau,et al. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.
[41] Joelle Pineau,et al. A Deep Reinforcement Learning Chatbot , 2017, ArXiv.
[42] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[43] Bing Liu,et al. Iterative policy learning in end-to-end trainable task-oriented neural dialog models , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[44] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[45] Richard E. Turner,et al. Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control , 2016, ICML.
[46] Holger Schwenk,et al. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.
[47] Sergey Levine,et al. Uncertainty-Aware Reinforcement Learning for Collision Avoidance , 2017, ArXiv.
[48] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).
[49] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[50] Gunhee Kim,et al. A Hierarchical Latent Structure for Variational Conversation Modeling , 2018, NAACL.
[51] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[52] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[53] Zhou Yu,et al. Sentiment Adaptive End-to-End Dialog Systems , 2018, ACL.
[54] Dilek Z. Hakkani-Tür,et al. Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems , 2018, NAACL.
[55] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[56] Bing Liu,et al. Bootstrapping a Neural Conversational Agent with Dialogue Self-Play, Crowdsourcing and On-Line Reinforcement Learning , 2018, NAACL.
[57] M. de Rijke,et al. Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning , 2018, AAAI.
[58] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[59] Natasha Jaques,et al. Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems , 2019, NeurIPS.
[60] Emma Brunskill,et al. Off-Policy Policy Gradient with State Distribution Correction , 2019, UAI 2019.
[61] Thomas Brox,et al. CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity , 2019, 1902.05605.
[62] Tom B. Brown,et al. Fine-Tuning Language Models from Human Preferences , 2019, ArXiv.
[63] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[64] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[65] Pascale Fung,et al. HappyBot: Generating Empathetic Dialogue Responses by Improving User Experience Look-ahead , 2019, ArXiv.
[66] Jason Weston,et al. Learning from Dialogue after Deployment: Feed Yourself, Chatbot! , 2019, ACL.
[67] Dale Schuurmans,et al. Striving for Simplicity in Off-policy Deep Reinforcement Learning , 2019, ArXiv.
[68] Marc G. Bellemare,et al. Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift , 2019, AAAI.
[69] Harry Shum,et al. The Design and Implementation of XiaoIce, an Empathetic Social Chatbot , 2018, CL.
[70] Generating Empathetic Dialogue Responses by Improving User Experience Lookahead , .