Maximum Entropy Monte-Carlo Planning
暂无分享,去创建一个
Dale Schuurmans | Ruitong Huang | Chenjun Xiao | Jincheng Mei | Martin Müller | D. Schuurmans | Ruitong Huang | Martin Müller | Jincheng Mei | Chenjun Xiao
[1] Tristan Cazenave,et al. Ieee Transactions on Computational Intelligence and Ai in Games 1 Sequential Halving Applied to Trees , 2022 .
[2] Carolyn Pillers Dobler,et al. Mathematical Statistics , 2002 .
[3] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[4] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[5] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[6] Martin Müller,et al. Memory-Augmented Monte Carlo Tree Search , 2018, AAAI.
[7] Dana S. Nau,et al. An Analysis of Forward Pruning , 1994, AAAI.
[8] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .
[9] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[10] Oren Somekh,et al. Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.
[11] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[12] Rémi Munos,et al. Bandit Algorithms for Tree Search , 2007, UAI.
[13] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[14] Peter Stone,et al. On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search , 2016, ICML.
[15] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[16] David Tolpin,et al. MCTS Based on Simple Regret , 2012, AAAI.
[17] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[18] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.