Guided Monte Carlo Tree Search for Planning in Learned Environments

Monte Carlo tree search (MCTS) is a sampling and simulation based technique for searching in large search spaces containing both decision nodes and probabilistic events. This technique has recently become popular due to its successful application to games, e.g. Poker Van den Broeck et al. (2009) and Go Coulom (2006); Chaslot et al. (2006); Gelly and Silver (2012)). Such games have known rules and the alternation between self-moves and non-deterministic events or opponent moves can be used to prune uninteresting branches. In this paper we study a real-world setting where the processes in the domain have a high degree of uncertainty and the need for longer-term planning implies a sequence of (planning) decisions without any intermediate feedback. Fortunately, unlike the combinatorial complexity in strategic games, many real-world environments can be approximated by ecient algorithms on a short term. This paper proposes an MCTS variant using a new type of prior information based on estimating the eects of part of the world and explores its application to the problem of hospital planning, where machine learning algorithms can be used to predict the length of stay of patients for each of the dierent stages of their recovery.

[1]  H. Jaap van den Herik,et al.  Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[2]  Rémi Coulom,et al.  Computing "Elo Ratings" of Move Patterns in the Game of Go , 2007, J. Int. Comput. Games Assoc..

[3]  Guy Van den Broeck,et al.  Monte-Carlo Tree Search in Poker Using Expected Reward Distributions , 2009, ACML.

[4]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[5]  I. Toumpoulis,et al.  Does EuroSCORE predict length of stay and specific postoperative complications after cardiac surgery? , 2005, European journal of cardio-thoracic surgery : official journal of the European Association for Cardio-thoracic Surgery.

[6]  Kenneth Foster,et al.  New math , 2009, IEEE Spectrum.

[7]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[8]  S. Lemeshow,et al.  European system for cardiac operative risk evaluation (EuroSCORE). , 1999, European journal of cardio-thoracic surgery : official journal of the European Association for Cardio-thoracic Surgery.

[9]  Jean-Marie Aerts,et al.  Computerized prediction of intensive care unit discharge after cardiac surgery: development and validation of a Gaussian processes model , 2011, BMC Medical Informatics Decis. Mak..

[10]  Bruno Bouzy,et al.  Monte-Carlo strategies for computer Go , 2006 .

[11]  Michèle Sebag,et al.  The grand challenge of computer Go , 2012, Commun. ACM.

[12]  Maurice Bruynooghe,et al.  Mining data from intensive care patients , 2007, Adv. Eng. Informatics.

[13]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[14]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[15]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[16]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[17]  Michèle Sebag,et al.  Pilot, Rollout and Monte Carlo Tree Search Methods for Job Shop Scheduling , 2012, LION.