论文信息 - Monte Carlo Tree Search in Continuous Action Spaces with Execution Uncertainty - 字舞流文

Monte Carlo Tree Search in Continuous Action Spaces with Execution Uncertainty

Real world applications of artificial intelligence often require agents to sequentially choose actions from continuous action spaces with execution uncertainty. When good actions are sparse, domain knowledge is often used to identify a discrete set of promising actions. These actions and their uncertain effects are typically evaluated using a recursive search procedure. The reduction of the problem to a discrete search problem causes severe limitations, notably, not exploiting all of the sampled outcomes when evaluating actions, and not using outcomes to help find new actions outside the original set. We propose a new Monte Carlo tree search (MCTS) algorithm specifically designed for exploiting an execution model in this setting. Using kernel regression, it generalizes the information about action quality between actions and to unexplored parts of the action space. In a high fidelity simulator of the Olympic sport of curling, we show that this approach significantly outperforms existing MCTS methods.

Michael H. Bowling | Viliam Lisý | Timothy Yee | Michael Bowling | V. Lisý | Timothy Yee

[1] A. Macdonald. A Statistician , 1921 .

[2] H. Jeffreys,et al. The Theory of Probability , 1896 .

[3] E. Nadaraya. On Estimating Regression , 1964 .

[4] G. S. Watson,et al. Smooth regression analysis , 1964 .

[5] Nils J. Nilsson,et al. Artificial Intelligence , 1974, IFIP Congress.

[6] P. Michels. Asymmetric kernel functions in non-parametric regression analysis and prediction , 1992 .

[7] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[8] M. Denny,et al. Curling rock dynamics: Towards a realistic model , 2002 .

[9] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[10] E. T. Jensen,et al. The motion of curling rocks: Experimental investigation and semi-phenomenological description , 2004, 2112.09835.

[11] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[12] R. Karandikar,et al. Sankhyā, The Indian Journal of Statistics , 2006 .

[13] Olivier Teytaud,et al. Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[14] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[15] Rémi Coulom,et al. Computing "Elo Ratings" of Move Patterns in the Game of Go , 2007, J. Int. Comput. Games Assoc..

[16] Michael Smith. PickPocket: A computer billiards shark , 2007, Artif. Intell..

[17] H. Jaap van den Herik,et al. Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[18] Marios Hadjieleftheriou,et al. R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[19] Christopher Archibald,et al. Analysis of a Winning Computational Billiards Player , 2009, IJCAI.

[20] Hiroaki Kitano,et al. Proceedings of the 21st international jont conference on Artifical intelligence , 2009 .

[21] Michael L. Littman,et al. Sample-Based Planning for Continuous Action Markov Decision Processes , 2011, ICAPS.

[22] Csaba Szepesvári,et al. –armed Bandits , 2022 .

[23] Nataliya Sokolovska,et al. Continuous Upper Confidence Trees , 2011, LION.

[24] David Silver,et al. Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..

[25] Sture Hogmark,et al. Calculated Trajectories of Curling Stones Sliding Under Asymmetrical Friction: Validation of Published Models , 2013, Tribology Letters.

[26] Sture Hogmark,et al. The asymmetrical friction mechanism that puts the curl in the curling stone , 2013 .

[27] Dinh Phung,et al. Journal of Machine Learning Research: Preface , 2014 .

[28] Masahito Yamamoto,et al. Digital curling strategy based on game tree search , 2015, 2015 IEEE Conference on Computational Intelligence and Games (CIG).

[29] Michael H. Bowling,et al. Action Selection for Hammer Shots in Curling , 2016, IJCAI.

[30] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.