论文信息 - Open Loop Execution of Tree-Search Algorithms

Open Loop Execution of Tree-Search Algorithms

In the context of tree-search stochastic planning algorithms where a generative model is available, we consider on-line planning algorithms building trees in order to recommend an action. We investigate the question of avoiding re-planning in subsequent decision steps by directly using sub-trees as action recommender. Firstly, we propose a method for open loop control via a new algorithm taking the decision of re-planning or not at each time step based on an analysis of the statistics of the sub-tree. Secondly, we show that the probability of selecting a suboptimal action at any depth of the tree can be upper bounded and converges towards zero. Moreover, this upper bound decays in a logarithmic way between subsequent depths. This leads to a distinction between node-wise optimality and state-wise optimality. Finally, we empirically demonstrate that our method achieves a compromise between loss of performance and computational gain.

[1] Malte Helmert,et al. UCT for Pac-Man , 2011 .

[2] Michail G. Lagoudakis,et al. On the locality of action domination in sequential decision making , 2010, ISAIM.

[3] Michael L. Littman,et al. Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes , 2012, ICAPS.

[4] Simon M. Lucas,et al. The physical travelling salesman problem: WCCI 2012 competition , 2012, 2012 IEEE Congress on Evolutionary Computation.

[5] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[6] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[8] David W. Aha,et al. Goal reasoning for autonomous underwater vehicles: Responding to unexpected agents , 2018, AI Commun..

[9] Saiful Islam,et al. Mahalanobis Distance , 2009, Encyclopedia of Biometrics.

[10] Simon M. Lucas,et al. Monte Carlo Tree Search: Long-term versus short-term planning , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[11] Martin Müller,et al. Fuego—An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[12] Richard S. Sutton,et al. Sample-based learning and search with permanent and transient memories , 2008, ICML '08.

[13] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[14] David W. Aha,et al. Bounded Expectations for Discrepancy Detection in Goal-Driven Autonomy , 2014, AAAI 2014.

[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[17] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[18] Malte Helmert,et al. Trial-Based Heuristic Tree Search for Finite Horizon MDPs , 2013, ICAPS.

[19] Rémi Munos,et al. Open Loop Optimistic Planning , 2010, COLT.

[20] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[21] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[22] Alex M. Andrew,et al. Reinforcement Learning: : An Introduction , 1998 .