Successor Uncertainties: exploration and uncertainty in temporal difference learning
暂无分享,去创建一个
Sebastian Tschiatschek | David Janz | José Miguel Hernández-Lobato | Katja Hofmann | Jiri Hron | Jiri Hron | Sebastian Tschiatschek | Katja Hofmann | David Janz
[1] Catholijn M. Jonker,et al. Efficient exploration with Double Uncertain Value Networks , 2017, ArXiv.
[2] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[3] Samuel Gershman,et al. Deep Successor Reinforcement Learning , 2016, ArXiv.
[4] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[5] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[6] Shie Mannor,et al. Shallow Updates for Deep Reinforcement Learning , 2017, NIPS.
[7] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[8] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[10] J. Sherman,et al. Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix , 1950 .
[11] Jianfeng Gao,et al. BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems , 2016, AAAI.
[12] Peter Dayan,et al. Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.
[13] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[14] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[15] Marlos C. Machado,et al. Count-Based Exploration with the Successor Representation , 2018, AAAI.
[16] Benjamin Van Roy,et al. On Lower Bounds for Regret in Reinforcement Learning , 2016, ArXiv.
[17] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.
[18] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[19] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[20] Ian Osband,et al. The Uncertainty Bellman Equation and Exploration , 2017, ICML.
[21] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[22] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).
[23] Katja Hofmann,et al. Depth and nonlinearity induce implicit exploration for RL , 2018, ArXiv.
[24] Marcus Hutter,et al. Count-Based Exploration in Feature Space for Reinforcement Learning , 2017, IJCAI.
[25] Zachary Chase Lipton,et al. Efficient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking , 2016 .
[26] Kenneth O. Stanley,et al. Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.
[27] Marlos C. Machado,et al. Eigenoption Discovery through the Deep Successor Representation , 2017, ICLR.
[28] Jasper Snoek,et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.
[29] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[30] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[31] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[32] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[33] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[34] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[35] Andreas Krause,et al. Information-Directed Exploration for Deep Reinforcement Learning , 2018, ICLR.
[36] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[37] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[38] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.
[39] Joelle Pineau,et al. Randomized Value Functions via Multiplicative Normalizing Flows , 2018, UAI.
[40] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[41] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[42] Yarin Gal,et al. Uncertainty in Deep Learning , 2016 .
[43] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[44] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[45] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[46] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[47] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[48] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[49] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[50] Benjamin Van Roy,et al. Learning to Optimize via Information-Directed Sampling , 2014, NIPS.
[51] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[52] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[53] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.