Monte-Carlo Tree Search by Best Arm Identification
暂无分享,去创建一个
[1] Tristan Cazenave,et al. Ieee Transactions on Computational Intelligence and Ai in Games 1 Sequential Halving Applied to Trees , 2022 .
[2] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[3] Matthew Malloy,et al. lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.
[4] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT.
[5] Jonathan Schaeffer,et al. Best-First Fixed-Depth Minimax Algorithms , 1996, J. Int. Comput. Games Assoc..
[6] Aurélien Garivier,et al. Optimal Best Arm Identification with Fixed Confidence , 2016, COLT.
[7] Ambuj Tewari,et al. PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.
[8] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[9] Mark H. M. Winands,et al. Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search , 2014, CGW@ECAI.
[10] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[11] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[12] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[13] Alessandro Lazaric,et al. Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.
[14] Eiji Takimoto,et al. Efficient Sampling Method for Monte Carlo Tree Search Problem , 2014, IEICE Trans. Inf. Syst..
[15] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[16] Oren Somekh,et al. Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.
[17] Wouter M. Koolen,et al. Maximin Action Identification: A New Bandit Framework for Games , 2016, COLT.
[18] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[19] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..
[20] Shivaram Kalyanakrishnan,et al. Information Complexity in Bandit Subset Selection , 2013, COLT.
[21] Lucian Busoniu,et al. An analysis of optimistic, best-first search for minimax sequential decision making , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[22] R. Munos,et al. Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.
[23] Csaba Szepesvári,et al. Structured Best Arm Identification with Fixed Confidence , 2017, ALT.