暂无分享,去创建一个
Anders Jonsson | Michal Valko | Emilie Kaufmann | Pierre M'enard | Omar Darwiche Domingues | Edouard Leurent
[1] Aurélien Garivier,et al. KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints , 2018, J. Mach. Learn. Res..
[2] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[3] R. Munos,et al. Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.
[4] Michal Valko,et al. Planning in entropy-regularized Markov decision processes and games , 2019, NeurIPS.
[5] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[6] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[7] Rémi Munos,et al. Open Loop Optimistic Planning , 2010, COLT.
[8] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[9] Rémi Munos,et al. Optimistic Planning in Markov Decision Processes Using a Generative Model , 2014, NIPS.
[10] Rémi Munos,et al. From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..
[11] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[12] Mark H. M. Winands,et al. Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search , 2014, CGW@ECAI.