暂无分享,去创建一个
[1] Tor Lattimore,et al. Optimally Confident UCB : Improved Regret for Finite-Armed Bandits , 2015, ArXiv.
[2] Laurent Orseau,et al. Universal Knowledge-Seeking Agents for Stochastic Environments , 2013, ALT.
[3] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[4] Marlos C. Machado,et al. Learning Purposeful Behaviour in the Absence of Rewards , 2016, ArXiv.
[5] Jan Leike,et al. Nonparametric General Reinforcement Learning , 2016, ArXiv.
[6] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[7] Yi Sun,et al. Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments , 2011, AGI.
[8] Gautam Reddy,et al. Infomax Strategies for an Optimal Balance Between Exploration and Exploitation , 2016, Journal of Statistical Physics.
[9] Benjamin Van Roy,et al. Learning to Optimize via Information-Directed Sampling , 2014, NIPS.
[10] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[11] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[12] Filip De Turck,et al. Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks , 2016, ArXiv.
[13] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[14] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.
[15] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[16] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.