Open Loop Execution of Tree-Search Algorithms

In the context of tree-search stochastic planning algorithms where a generative model is available, we consider on-line planning algorithms building trees in order to recommend an action. We investigate the question of avoiding re-planning in subsequent decision steps by directly using sub-trees as action recommender. Firstly, we propose a method for open loop control via a new algorithm taking the decision of re-planning or not at each time step based on an analysis of the statistics of the sub-tree. Secondly, we show that the probability of selecting a suboptimal action at any depth of the tree can be upper bounded and converges towards zero. Moreover, this upper bound decays in a logarithmic way between subsequent depths. This leads to a distinction between node-wise optimality and state-wise optimality. Finally, we empirically demonstrate that our method achieves a compromise between loss of performance and computational gain.

[1]  Malte Helmert,et al.  UCT for Pac-Man , 2011 .

[2]  Michail G. Lagoudakis,et al.  On the locality of action domination in sequential decision making , 2010, ISAIM.

[3]  Michael L. Littman,et al.  Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes , 2012, ICAPS.

[4]  Simon M. Lucas,et al.  The physical travelling salesman problem: WCCI 2012 competition , 2012, 2012 IEEE Congress on Evolutionary Computation.

[5]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[6]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[8]  David W. Aha,et al.  Goal reasoning for autonomous underwater vehicles: Responding to unexpected agents , 2018, AI Commun..

[9]  Saiful Islam,et al.  Mahalanobis Distance , 2009, Encyclopedia of Biometrics.

[10]  Simon M. Lucas,et al.  Monte Carlo Tree Search: Long-term versus short-term planning , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[11]  Martin Müller,et al.  Fuego—An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[12]  Richard S. Sutton,et al.  Sample-based learning and search with permanent and transient memories , 2008, ICML '08.

[13]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[14]  David W. Aha,et al.  Bounded Expectations for Discrepancy Detection in Goal-Driven Autonomy , 2014, AAAI 2014.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[17]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[18]  Malte Helmert,et al.  Trial-Based Heuristic Tree Search for Finite Horizon MDPs , 2013, ICAPS.

[19]  Rémi Munos,et al.  Open Loop Optimistic Planning , 2010, COLT.

[20]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[21]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[22]  Alex M. Andrew,et al.  Reinforcement Learning: : An Introduction , 1998 .