暂无分享,去创建一个
Kavosh Asadi | Evan Cater | Dipendra Misra | Michael L. Littman | Dipendra Kumar Misra | M. Littman | Kavosh Asadi | Evan Cater
[1] Alborz Geramifard,et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.
[2] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[3] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[4] Martial Hebert,et al. Improving Multi-Step Prediction of Learned Time Series Models , 2015, AAAI.
[5] Pieter Abbeel,et al. Using inaccurate models in reinforcement learning , 2006, ICML.
[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[7] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[8] C. Rasmussen,et al. Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .
[9] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[10] Bernardo Ávila Pires,et al. Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models , 2016, COLT.
[11] Erik Talvitie,et al. Model Regularization for Stable Sample Rollouts , 2014, UAI.
[12] Erik Talvitie,et al. Self-Correcting Models for Model-Based Reinforcement Learning , 2016, AAAI.
[13] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[14] Kavosh Asadi,et al. Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning , 2018, ArXiv.
[15] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[16] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[17] A. Markman,et al. Journal of Experimental Psychology : General Retrospective Revaluation in Sequential Decision Making : A Tale of Two Systems , 2012 .
[18] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[19] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[20] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[21] Kavosh Asadi,et al. Lipschitz Continuity in Model-based Reinforcement Learning , 2018, ICML.
[22] Sergey Levine,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.
[23] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.
[24] Kavosh Asadi,et al. Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning , 2018, ArXiv.
[25] Satinder P. Singh,et al. Linear options , 2010, AAMAS.
[26] G. Box. Science and Statistics , 1976 .
[27] P. Dayan,et al. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.
[28] Rich Sutton,et al. A Deeper Look at Planning as Learning from Replay , 2015, ICML.
[29] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[30] Shalabh Bhatnagar,et al. Multi-Step Dyna Planning for Policy Evaluation and Control , 2009, NIPS.
[31] Kavosh Asadi Atui. Strengths, Weaknesses, and Combinations of Model-based and Model-free Reinforcement Learning , 2016 .
[32] Richard S. Sutton,et al. Integrated Modeling and Control Based on Reinforcement Learning , 1990, NIPS.
[33] M. Littman,et al. Mean Actor Critic , 2017, ArXiv.
[34] Daniel Nikovski,et al. Value-Aware Loss Function for Model-based Reinforcement Learning , 2017, AISTATS.
[35] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[36] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[37] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[38] Nan Jiang,et al. The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.
[39] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[40] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.