The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach
暂无分享,去创建一个
Joelle Pineau | Michael Pieper | Yoshua Bengio | Iulian Serban | Chinnadhurai Sankar | Yoshua Bengio | Joelle Pineau | Iulian Serban | Chinnadhurai Sankar | Michael Pieper
[1] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.
[2] Olivier Sigaud,et al. Learning the structure of Factored Markov Decision Processes in reinforcement learning problems , 2006, ICML.
[3] Robert Givan,et al. Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.
[4] David Suendermann-Oeft,et al. Are We There Yet? Research in Commercial Spoken Dialog Systems , 2009, TSD.
[5] Benjamin Van Roy,et al. Near-optimal Reinforcement Learning in Factored MDPs , 2014, NIPS.
[6] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.
[7] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[8] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[9] Joelle Pineau,et al. Building Adaptive Dialogue Systems Via Bayes-Adaptive POMDPs , 2012, IEEE Journal of Selected Topics in Signal Processing.
[10] André da Motta Salles Barreto,et al. An Expectation-Maximization Algorithm to Compute a Stochastic Factorization From Data , 2015, IJCAI.
[11] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[12] Peter Stone,et al. State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.
[13] Leonid Kuvayev. Approximation in Model-Based Learning , 1997 .
[14] Martial Hebert,et al. Improved Learning of Dynamics Models for Control , 2016, ISER.
[15] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[16] Nan Jiang,et al. Abstraction Selection in Model-based Reinforcement Learning , 2015, ICML.
[17] Bing Liu,et al. Iterative policy learning in end-to-end trainable task-oriented neural dialog models , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[18] Nan Rosemary Ke,et al. The Octopus Approach to the Alexa Competition : A Deep Ensemble-based Socialbot , 2017 .
[19] Maxine Eskénazi,et al. POMDP-based Let's Go system for spoken dialog challenge , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).
[20] Joelle Pineau,et al. Combined Reinforcement Learning via Abstract Representations , 2018, AAAI.
[21] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[22] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.
[23] Yi Pan,et al. Conversational AI: The Science Behind the Alexa Prize , 2018, ArXiv.
[24] Yuandong Tian,et al. Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees , 2018, ICLR.
[25] Marilyn A. Walker,et al. An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email , 2000, J. Artif. Intell. Res..
[26] Michael L. Littman,et al. Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.
[27] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[28] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[29] Marc G. Bellemare,et al. Bayesian Learning of Recursively Factored Environments , 2013, ICML.
[30] Maxine Eskénazi,et al. Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning , 2016, SIGDIAL Conference.
[31] SystemSatinder Singh baveja. Optimizing Dialogue Management with Reinfor ementLearning : Experiments with the NJFun , 2002 .
[32] Andreas Stolcke,et al. Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.
[33] Steve J. Young,et al. Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..
[34] Jürgen Schmidhuber,et al. Reinforcement Learning Soccer Teams with Incomplete World Models , 1999, Auton. Robots.
[35] Sergey Levine,et al. Goal-driven dynamics learning via Bayesian optimization , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).
[36] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[37] Marek Petrik,et al. RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning , 2014, NIPS.
[38] Jonathan P. How,et al. Real-World Reinforcement Learning via Multifidelity Simulators , 2015, IEEE Transactions on Robotics.
[39] Yann Dauphin,et al. Deal or No Deal? End-to-End Learning of Negotiation Dialogues , 2017, EMNLP.
[40] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[41] Stephane Ross,et al. Interactive Learning for Sequential Decisions and Predictions , 2013 .
[42] Iñigo Casanueva,et al. Neural User Simulation for Corpus-based Policy Optimisation of Spoken Dialogue Systems , 2018, SIGDIAL Conference.
[43] Marilyn A. Walker,et al. Reinforcement Learning for Spoken Dialogue Systems , 1999, NIPS.
[44] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.
[45] Kallirroi Georgila,et al. User simulation for spoken dialogue systems: learning and evaluation , 2006, INTERSPEECH.
[46] Richard S. Sutton,et al. Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .
[47] André da Motta Salles Barreto,et al. Incremental Stochastic Factorization for Online Reinforcement Learning , 2016, AAAI.
[48] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[49] Robert L. Smith,et al. Aggregation in Dynamic Programming , 1987, Oper. Res..
[50] Zhou Yu,et al. Strategy and Policy Learning for Non-Task-Oriented Conversational Systems , 2016, SIGDIAL Conference.
[51] Daniel Nikovski,et al. Value-Aware Loss Function for Model-based Reinforcement Learning , 2017, AISTATS.
[52] Erik Talvitie,et al. Agnostic System Identification for Monte Carlo Planning , 2015, AAAI.
[53] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[54] David R. Traum,et al. Multi-party, Multi-issue, Multi-strategy Negotiation for Multi-modal Virtual Agents , 2008, IVA.
[55] Ross A. Knepper,et al. DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.
[56] Dileep George,et al. Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics , 2017, ICML.
[57] DarrellTrevor,et al. End-to-end training of deep visuomotor policies , 2016 .
[58] David Vandyke,et al. Continuously Learning Neural Dialogue Management , 2016, ArXiv.
[59] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.
[60] Jing He,et al. A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems , 2016, INTERSPEECH.
[61] H. Cuayahuitl,et al. Human-computer dialogue simulation using hidden Markov models , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..
[62] Doina Precup,et al. Metrics for Finite Markov Decision Processes , 2004, AAAI.
[63] Kavosh Asadi,et al. Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning , 2018, ArXiv.
[64] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[65] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[66] Milica Gasic,et al. Effective handling of dialogue state in the hidden information state POMDP-based dialogue manager , 2011, TSLP.
[67] Joelle Pineau,et al. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.
[68] Regina Barzilay,et al. Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.
[69] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[70] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[71] Jing He,et al. Policy Networks with Two-Stage Training for Dialogue Systems , 2016, SIGDIAL Conference.
[72] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..
[73] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[74] J. Andrew Bagnell,et al. Agnostic System Identification for Model-Based Reinforcement Learning , 2012, ICML.
[75] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
[76] Benjamin Van Roy. Performance Loss Bounds for Approximate Value Iteration with State Aggregation , 2006, Math. Oper. Res..
[77] Bo Wu,et al. Model-based Bayesian Reinforcement Learning in Factored Markov Decision Process , 2014, J. Comput..
[78] Jürgen Schmidhuber,et al. Efficient model-based exploration , 1998 .
[79] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.
[80] Stefan Ultes,et al. Feudal Reinforcement Learning for Dialogue Management in Large Domains , 2018, NAACL.
[81] Satinder Singh,et al. Learning to Query, Reason, and Answer Questions On Ambiguous Texts , 2016, ICLR.
[82] John N. Tsitsiklis,et al. Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.
[83] D. Bertsekas,et al. Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .
[84] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.
[85] Eric Atwell,et al. Chatbots: Are they Really Useful? , 2007, LDV Forum.
[86] Peter A. Heeman,et al. Representing the Reinforcement Learning state in a negotiation dialogue , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.
[87] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[88] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[89] Doina Precup,et al. Methods for Computing State Similarity in Markov Decision Processes , 2006, UAI.
[90] Joelle Pineau,et al. A Deep Reinforcement Learning Chatbot , 2017, ArXiv.
[91] Grace Chung,et al. Developing a Flexible Spoken Dialog System Using Simulation , 2004, ACL.
[92] Roberto Pieraccini,et al. User modeling for spoken dialogue system evaluation , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.
[93] Kam-Fai Wong,et al. Integrating planning for task-completion dialogue policy learning , 2018, ACL.
[94] Michael L. Littman,et al. Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.
[95] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[96] Joelle Pineau,et al. Model-Based Bayesian Reinforcement Learning in Large Structured Domains , 2008, UAI.
[97] Stefan Lee,et al. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[98] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.
[99] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[100] Shie Mannor,et al. Off-policy Model-based Learning under Unknown Factored Dynamics , 2015, ICML.
[101] Peter Henderson,et al. An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..
[102] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[103] Ramón López-Cózar,et al. Automatic creation of scenarios for evaluating spoken dialogue systems via user-simulation , 2016, Knowl. Based Syst..
[104] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[105] Roberto Pieraccini,et al. A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..
[106] Hui Ye,et al. Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.
[107] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[108] Romain Laroche,et al. Incremental Human-Machine Dialogue Simulation , 2017, IWSDS.
[109] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[110] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[111] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[112] S. Young,et al. Scaling POMDPs for Spoken Dialog Management , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[113] S. Singh,et al. Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..
[114] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).