论文信息 - Open Loop Execution of Tree-Search Algorithms

Open Loop Execution of Tree-Search Algorithms

In the context of tree-search stochastic planning algorithms where a generative model is available, we consider on-line planning algorithms building trees in order to recommend an action. We investigate the question of avoiding re-planning in subsequent decision steps by directly using the sub-tree as an action recommender. Firstly, we propose a method for open loop control via a new algorithm taking the decision of re-planning or not at each time step based on an analysis of the statistics of the sub-tree. Secondly , we show that the probability of selecting a subopti-mal action at any depth of the tree can be upper bounded and converges towards zero. Moreover, this upper bound decays in a logarithmic way between subsequent depths. This leads to a distinction between node-wise optimality and state-wise optimality. Finally, we empirically demonstrate that our method achieves a compromise between loss of performance and computational gain.

[1] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[2] Simon M. Lucas,et al. Monte Carlo Tree Search: Long-term versus short-term planning , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[3] Zoubin Ghahramani,et al. Proceedings of the 24th international conference on Machine learning , 2007, ICML 2007.

[4] Malte Helmert,et al. Trial-Based Heuristic Tree Search for Finite Horizon MDPs , 2013, ICAPS.

[5] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[8] Nils J. Nilsson,et al. Artificial Intelligence , 1974, IFIP Congress.

[9] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[10] Michail G. Lagoudakis,et al. On the locality of action domination in sequential decision making , 2010, ISAIM.

[11] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[12] Malte Helmert,et al. UCT for Pac-Man , 2011 .

[13] Rémi Munos,et al. Open Loop Optimistic Planning , 2010, COLT.

[14] David W. Aha,et al. Bounded Expectations for Discrepancy Detection in Goal-Driven Autonomy , 2014, AAAI 2014.

[15] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[16] Michael L. Littman,et al. Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes , 2012, ICAPS.

[17] Simon M. Lucas,et al. The physical travelling salesman problem: WCCI 2012 competition , 2012, 2012 IEEE Congress on Evolutionary Computation.

[18] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[19] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[20] David W. Aha,et al. Goal reasoning for autonomous underwater vehicles: Responding to unexpected agents , 2018, AI Commun..

[21] Saiful Islam,et al. Mahalanobis Distance , 2009, Encyclopedia of Biometrics.

[22] Michel José Anzanello,et al. Chemometrics and Intelligent Laboratory Systems , 2009 .

[23] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[24] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[25] Martin Müller,et al. Fuego—An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[26] Richard S. Sutton,et al. Sample-based learning and search with permanent and transient memories , 2008, ICML '08.

[27] Alice M. Obenchain-Leeson,et al. Volume 6 , 1998 .