暂无分享,去创建一个
[1] John N. Tsitsiklis,et al. Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..
[2] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[3] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[4] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.
[5] John N. Tsitsiklis,et al. Mean-Variance Optimization in Markov Decision Processes , 2011, ICML.
[6] Masashi Sugiyama,et al. Parametric Return Density Estimation for Reinforcement Learning , 2010, UAI.
[7] John N. Tsitsiklis,et al. Bias and variance in value function estimation , 2004, ICML.
[8] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.
[9] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[10] D. White. Mean, variance, and probabilistic criteria in finite Markov decision processes: A review , 1988 .
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] Shie Mannor,et al. Learning the Variance of the Reward-To-Go , 2016, J. Mach. Learn. Res..
[13] C. Rasmussen,et al. Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .
[14] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[15] Catholijn M. Jonker,et al. Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning , 2017, ArXiv.
[16] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[17] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[18] Yarin Gal,et al. Uncertainty in Deep Learning , 2016 .
[19] José Miguel Hernández-Lobato,et al. Uncertainty Decomposition in Bayesian Neural Networks with Latent Variables , 2017, 1706.08495.
[20] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .
[21] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[22] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[23] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[24] Jan Peters,et al. Generalized exploration in policy search , 2017, Machine Learning.
[25] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[26] Ian Osband,et al. The Uncertainty Bellman Equation and Exploration , 2017, ICML.
[27] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[28] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[29] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[30] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.
[31] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[32] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.
[33] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[34] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[35] Peter Dayan,et al. Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.
[36] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[37] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[38] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[39] Mohammad Ghavamzadeh,et al. Bayesian actor-critic algorithms , 2007, ICML '07.
[40] Mohammad Ghavamzadeh,et al. Bayesian Policy Gradient Algorithms , 2006, NIPS.