Fast Exploration with Simplified Models and Approximately Optimistic Planning in Model Based Reinforcement Learning
暂无分享,去创建一个
[1] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.
[2] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[3] Pieter Spronck,et al. Monte-Carlo Tree Search: A New Framework for Game AI , 2008, AIIDE.
[4] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[5] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[6] Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.
[7] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[8] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[9] Peter Dayan,et al. Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.
[10] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[11] Thomas J. Walsh,et al. Knows what it knows: a framework for self-aware learning , 2008, ICML.
[12] Nan Jiang,et al. The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.
[13] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).
[14] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[15] Murray Shanahan,et al. Towards Deep Symbolic Reinforcement Learning , 2016, ArXiv.
[16] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[17] O. Teytaud,et al. Monte Carlo Tree Search in Go , 2013 .
[18] Nicholas Roy,et al. Provably Efficient Learning with Typed Parametric Models , 2009, J. Mach. Learn. Res..
[19] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[20] A. P. Hyper-parameters. Count-Based Exploration with Neural Density Models , 2017 .
[21] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[22] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[23] Kaiming He,et al. Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[24] Christopher D. Rosin,et al. Nested Rollout Policy Adaptation for Monte Carlo Tree Search , 2011, IJCAI.
[25] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[26] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[27] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[28] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.
[29] John N. Tsitsiklis,et al. Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..
[30] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[31] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[32] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[33] Joshua B. Tenenbaum,et al. Human Learning in Atari , 2017, AAAI Spring Symposia.
[34] Shie Mannor,et al. Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..
[35] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[36] Umesh V. Vazirani,et al. An Introduction to Computational Learning Theory , 1994 .
[37] Andrea Lockerd Thomaz,et al. Object focused q-learning for autonomous agents , 2013, AAMAS.
[38] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[39] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[40] Stefanie Tellex,et al. Deep Abstract Q-Networks , 2017, AAMAS.
[41] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[42] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[43] Ross A. Knepper,et al. DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.
[44] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Alexei A. Efros,et al. Investigating Human Priors for Playing Video Games , 2018, ICML.
[46] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[47] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[48] Benjamin Van Roy,et al. Why is Posterior Sampling Better than Optimism for Reinforcement Learning? , 2016, ICML.
[49] Carlos Guestrin,et al. Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.
[50] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[51] Andre Cohen,et al. An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.
[52] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.