暂无分享,去创建一个
[1] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[2] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[3] Andre Cohen,et al. An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.
[4] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[5] Murray Shanahan,et al. Towards Deep Symbolic Reinforcement Learning , 2016, ArXiv.
[6] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[7] Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.
[8] Carlos Guestrin,et al. Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.
[9] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[10] Christopher D. Rosin,et al. Nested Rollout Policy Adaptation for Monte Carlo Tree Search , 2011, IJCAI.
[11] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.
[12] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[13] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[14] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[15] Alexei A. Efros,et al. Investigating Human Priors for Playing Video Games , 2018, ICML.
[16] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[17] Nan Jiang,et al. The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.
[18] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).
[19] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[20] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[21] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.
[22] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[23] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[24] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[25] Benjamin Van Roy,et al. Why is Posterior Sampling Better than Optimism for Reinforcement Learning? , 2016, ICML.
[26] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[27] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[28] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[29] A. P. Hyper-parameters. Count-Based Exploration with Neural Density Models , 2017 .
[30] Peter Dayan,et al. Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.
[31] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[32] Andrea Lockerd Thomaz,et al. Object focused q-learning for autonomous agents , 2013, AAMAS.
[33] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[34] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[35] Joshua B. Tenenbaum,et al. Human Learning in Atari , 2017, AAAI Spring Symposia.
[36] Shie Mannor,et al. Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..
[37] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[38] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[39] Stefanie Tellex,et al. Deep Abstract Q-Networks , 2017, AAMAS.
[40] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[41] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[42] Thomas J. Walsh,et al. Knows what it knows: a framework for self-aware learning , 2008, ICML.
[43] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[44] Ross A. Knepper,et al. DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.