Deep Exploration via Bootstrapped DQN
暂无分享,去创建一个
Benjamin Van Roy | Charles Blundell | Ian Osband | Alexander Pritzel | Ian Osband | C. Blundell | A. Pritzel
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] D. Freedman,et al. Some Asymptotic Theory for the Bootstrap , 1981 .
[3] B. Efron. The jackknife, the bootstrap, and other resampling plans , 1987 .
[4] Raul Cano. On The Bayesian Bootstrap , 1992 .
[5] S. T. Buckland,et al. An Introduction to the Bootstrap. , 1994 .
[6] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[7] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..
[8] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[9] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[10] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[11] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[12] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[13] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[14] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] M. Kenward,et al. An Introduction to the Bootstrap , 2007 .
[17] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[18] Alex Graves,et al. Practical Variational Inference for Neural Networks , 2011, NIPS.
[19] Purnamrita Sarkar,et al. A scalable bootstrap for massive data , 2011, 1112.5016.
[20] Peter Dayan,et al. Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.
[21] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[22] A. Owen,et al. Bootstrapping data arrays of arbitrary order , 2011, 1106.2125.
[23] Zheng Wen,et al. Efficient Exploration and Value Function Generalization in Deterministic Systems , 2013, NIPS.
[24] Ian. ( More ) Efficient Reinforcement Learning via Posterior Sampling , 2013 .
[25] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[26] Benjamin Van Roy,et al. Near-optimal Reinforcement Learning in Factored MDPs , 2014, NIPS.
[27] Peter Dayan,et al. Bayes-Adaptive Simulation-based Search with Value Function Approximation , 2014, NIPS.
[28] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[29] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[30] Rémi Munos,et al. From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..
[31] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.
[32] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[33] Shie Mannor,et al. Thompson Sampling for Learning Parameterized Markov Decision Processes , 2014, COLT.
[34] Csaba Szepesvári,et al. Bayesian Optimal Control of Smoothly Parameterized Systems , 2015, UAI.
[35] Julien Cornebise,et al. Weight Uncertainty in Neural Network , 2015, ICML.
[36] Ryan P. Adams,et al. Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.
[37] Benjamin Van Roy,et al. Bootstrapped Thompson Sampling and Deep Exploration , 2015, ArXiv.
[38] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.
[39] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.
[40] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[41] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.
[42] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[43] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[44] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[45] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[46] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[47] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[48] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[49] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[50] Yee Whye Teh,et al. Distributed Bayesian Learning with Stochastic Natural Gradient Expectation Propagation and the Posterior Server , 2015, J. Mach. Learn. Res..
[51] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .