暂无分享,去创建一个
Guy Van den Broeck | Ji Liu | Jianshu Chen | Yitao Liang | Anji Liu | Jianshu Chen | Ji Liu | Anji Liu | Yitao Liang
[1] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[2] Mark H. M. Winands,et al. Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search , 2014, CGW@ECAI.
[3] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[4] Richard B. Segal,et al. On the Scalability of Parallel UCT , 2010, Computers and Games.
[5] T. Cazenave,et al. On the Parallelization of UCT , 2007 .
[6] Rémi Munos,et al. From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..
[7] Carmel Domshlak,et al. Simple Regret Optimization in Online Planning for Markov Decision Processes , 2012, J. Artif. Intell. Res..
[8] H. Jaap van den Herik,et al. Parallel Monte-Carlo Tree Search , 2008, Computers and Games.
[9] S. Shankar Sastry,et al. A Multi-Armed Bandit Approach for Online Expert Selection in Markov Decision Processes , 2017, ArXiv.
[10] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[11] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.
[12] Yu Zhai,et al. Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search , 2020, ICLR.
[13] Jan Willemson,et al. Improved Monte-Carlo Search , 2006 .
[14] V. V. Buldygin,et al. Sub-Gaussian random variables , 1980 .
[15] P. Cowling,et al. Determinization in Monte-Carlo Tree Search for the card game , 2011 .
[16] Carmel Domshlak,et al. On MABs and Separation of Concerns in Monte-Carlo Planning for MDPs , 2014, ICAPS.
[17] Ikuo Takeuchi,et al. Parallel Monte-Carlo Tree Search with Simulation Servers , 2010, 2010 International Conference on Technologies and Applications of Artificial Intelligence.
[18] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[19] Akihiro Kishimoto,et al. Scalable Distributed Monte-Carlo Tree Search , 2011, SOCS.
[20] Osamu Watanabe,et al. Evaluating Root Parallelization in Go , 2010, IEEE Transactions on Computational Intelligence and AI in Games.
[21] David Tolpin,et al. Selecting Computations: Theory and Applications , 2012, UAI.
[22] Thomas Hérault,et al. Scalability and Parallelization of Monte-Carlo Tree Search , 2010, Computers and Games.
[23] Erik Ragnar Poromaa. Crushing Candy Crush : Predicting Human Success Rate in a Mobile Game using Monte-Carlo Tree Search , 2017 .
[24] Sylvain Gelly,et al. Exploration exploitation in Go: UCT for Monte-Carlo Go , 2006, NIPS 2006.
[25] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[26] Rémi Munos,et al. Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..
[27] Eshcar Hillel,et al. Distributed Exploration in Multi-Armed Bandits , 2013, NIPS.
[28] Wouter M. Koolen,et al. Monte-Carlo Tree Search by Best Arm Identification , 2017, NIPS.
[29] Qing Zhao,et al. Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.
[30] David Silver,et al. Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..
[31] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[32] Sam Devlin,et al. Combining Gameplay Data with Monte Carlo Tree Search to Emulate Human Play , 2016, AIIDE.
[33] Yelong Shen,et al. M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search , 2018, NeurIPS.
[34] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[35] Varun Kanade,et al. Decentralized Cooperative Stochastic Bandits , 2018, NeurIPS.
[36] Yoshimasa Tsuruoka,et al. Regulation of exploration for simple regret minimization in Monte-Carlo tree search , 2015, 2015 IEEE Conference on Computational Intelligence and Games (CIG).
[37] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[38] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[39] Nicolas Jouandeau,et al. A Parallel Monte-Carlo Tree Search Algorithm , 2008, Computers and Games.
[40] David Tolpin,et al. MCTS Based on Simple Regret , 2012, AAAI.
[41] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.