暂无分享,去创建一个
Andreas Krause | Johannes Kirschner | Felix Berkenkamp | Nikolay Nikolov | Andreas Krause | Felix Berkenkamp | Johannes Kirschner | Nikolay Nikolov
[1] Julien Cornebise,et al. Weight Uncertainty in Neural Network , 2015, ICML.
[2] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[3] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.
[4] Pieter Abbeel,et al. UCB and InfoGain Exploration via $\boldsymbol{Q}$-Ensembles , 2017, ArXiv.
[5] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[6] Benjamin Van Roy,et al. Learning to Optimize via Information-Directed Sampling , 2014, NIPS.
[7] Yunhao Tang,et al. Exploration by Distributional Reinforcement Learning , 2018, IJCAI.
[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[9] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[10] Jürgen Schmidhuber,et al. Formal Theory of Fun and Creativity , 2010, ECML/PKDD.
[11] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[12] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.
[13] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.
[14] Andrea Zanette,et al. Information Directed reinforcement learning , 2017 .
[15] Murray Shanahan,et al. Deep Reinforcement Learning with Risk-Seeking Exploration , 2018, SAB.
[16] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[17] Jasper Snoek,et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.
[18] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[19] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[20] Prabhat,et al. Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.
[21] David Hinkley,et al. Bootstrap Methods: Another Look at the Jackknife , 2008 .
[22] Catholijn M. Jonker,et al. The Potential of the Return Distribution for Exploration in RL , 2018, ArXiv.
[23] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[24] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[25] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).
[26] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.
[27] David M. Blei,et al. A Variational Analysis of Stochastic Gradient Algorithms , 2016, ICML.
[28] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[29] George E. Monahan,et al. A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .
[30] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[31] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .
[32] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[33] Ian Osband,et al. The Uncertainty Bellman Equation and Exploration , 2017, ICML.
[34] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[35] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[36] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[37] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[38] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[39] Catholijn M. Jonker,et al. Efficient exploration with Double Uncertain Value Networks , 2017, ArXiv.
[40] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[41] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[42] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[43] Andreas Krause,et al. Information Directed Sampling and Bandits with Heteroscedastic Noise , 2018, COLT.
[44] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[45] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[46] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[47] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .
[48] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[49] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[50] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[51] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.