Gaussian Approximation for Bias Reduction in Q-Learning
暂无分享,去创建一个
[1] Razvan Pascanu,et al. Spectral Normalisation for Deep Reinforcement Learning: an Optimisation Perspective , 2021, ICML.
[2] Jaime Fern'andez del R'io,et al. Array programming with NumPy , 2020, Nature.
[3] Daniel Guo,et al. Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.
[4] Martha White,et al. Maxmin Q-learning: Controlling the Estimation Bias of Q-learning , 2020, ICLR.
[5] Andrea Bonarini,et al. MushroomRL: Simplifying Reinforcement Learning Research , 2020, J. Mach. Learn. Res..
[6] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[7] Marcello Restelli,et al. Exploiting Action-Value Uncertainty to Drive Exploration in Reinforcement Learning , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).
[8] Volodymyr Kuleshov,et al. Calibrated Model-Based Deep Reinforcement Learning , 2019, ICML.
[9] Tian Tian,et al. MinAtar: An Atari-Inspired Testbed for Thorough and Reproducible Reinforcement Learning Experiments , 2019 .
[10] Marlos C. Machado,et al. Generalization and Regularization in DQN , 2018, ArXiv.
[11] Mark Collier,et al. Deep Contextual Multi-armed Bandits , 2018, ArXiv.
[12] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[13] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[14] Jasper Snoek,et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.
[15] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[16] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[17] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[18] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[19] Mykel J. Kochenderfer,et al. Weighted Double Q-learning , 2017, IJCAI.
[20] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[21] Alex Kendall,et al. Concrete Dropout , 2017, NIPS.
[22] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[23] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[24] Marcello Restelli,et al. Estimating the Maximum Expected Value in Continuous Reinforcement Learning Problems , 2017, AAAI.
[25] Sergey Levine,et al. Uncertainty-Aware Reinforcement Learning for Collision Avoidance , 2017, ArXiv.
[26] Kavosh Asadi,et al. An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.
[27] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.
[28] Nahum Shimkin,et al. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.
[29] Marcello Restelli,et al. Estimating Maximum Expected Value through Gaussian Approximation , 2016, ICML.
[30] J. Schulman,et al. OpenAI Gym , 2016, ArXiv.
[31] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[32] Luca Martino,et al. Effective sample size for importance sampling based on discrepancy measures , 2016, Signal Process..
[33] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[34] Marc G. Bellemare,et al. Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.
[35] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[36] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[37] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[38] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[39] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[40] Diederik P. Kingma,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.
[41] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[42] Marc G. Bellemare,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[43] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[44] Tao Qin,et al. Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising , 2013, NIPS.
[45] Warren B. Powell,et al. Bias-corrected Q-learning to control max-operator bias in Q-learning , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[46] Hado van Hasselt,et al. Estimating the Maximum Expected Value: An Analysis of (Nested) Cross Validation and the Maximum Sample Average , 2013, ArXiv.
[47] Neil D. Lawrence,et al. Deep Gaussian Processes , 2012, AISTATS.
[48] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[49] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[50] Masashi Sugiyama,et al. Parametric Return Density Estimation for Reinforcement Learning , 2010, UAI.
[51] R. Tibshirani,et al. A bias correction for the minimum error rate in cross-validation , 2009, 0908.2904.
[52] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[53] Robert L. Winkler,et al. The Optimizer's Curse: Skepticism and Postdecision Surprise in Decision Analysis , 2006, Manag. Sci..
[54] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[55] Carl E. Rasmussen,et al. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .
[56] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[57] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[58] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[59] A. Cohen,et al. ESTIMATION OF THE LARGER OF TWO NORMAL MEANS , 1968 .
[60] R Bellman,et al. On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.
[61] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[62] Marcello Restelli,et al. Propagating Uncertainty in Reinforcement Learning via Wasserstein Barycenters , 2019, NeurIPS.
[63] Roy Fox. Toward Provably Unbiased Temporal-Difference Value Estimation , 2019 .
[64] C. Rasmussen,et al. Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .
[65] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[66] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[67] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[68] D. BhaeiyalIshwaei,et al. Non-existence of unbiased estimators of ordered parameters , 1985 .