暂无分享,去创建一个
Pieter Abbeel | John Schulman | Szymon Sidor | Richard Y. Chen | J. Schulman | P. Abbeel | Szymon Sidor
[1] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.
[2] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[3] A. P. Hyper-parameters. Count-Based Exploration with Neural Density Models , 2017 .
[4] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[5] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[6] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[7] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[8] Benjamin Van Roy,et al. Why is Posterior Sampling Better than Optimism for Reinforcement Learning? , 2016, ICML.
[9] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[10] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[11] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[13] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[14] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[15] Justin Fu,et al. EX2: Exploration with Exemplar Models for Deep Reinforcement Learning , 2017, NIPS.
[16] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..
[17] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[18] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[19] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[20] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[21] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[22] Yi Sun,et al. Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments , 2011, AGI.
[23] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.