Efficient Exploration in Monte Carlo Tree Search using Human Action Abstractions

Monte Carlo Tree Search (MCTS) is a family of methods for planning in large domains. It focuses on finding a good action for a particular state, making its complexity independent of the size of the state space. However such methods are exponential with respect to the branching factor. Effective application of MCTS requires good heuristics to arbitrate action selection during learning. In this paper we present a policy-guided approach that utilizes action abstractions, derived from human input, with MCTS to facilitate efficient exploration. We draw from existing work in hierarchical reinforcement learning, interactive machine learning and show how multi-step actions, represented as stochastic policies, can serve as good action selection heuristics. We demonstrate the efficacy of our approach in the PacMan domain and highlight its advantages over traditional MCTS.

[1]  Honglak Lee,et al.  Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[2]  Sridhar Mahadevan,et al.  Learning to Take Concurrent Actions , 2002, NIPS.

[3]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[4]  Arya Irani,et al.  Utilizing negative policy information to accelerate reinforcement learning , 2015 .

[5]  K. Subramanian,et al.  Learning Options through Human Interaction , 2011 .

[6]  Marco Colombetti,et al.  Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..

[7]  Scott Kuindersma,et al.  Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories , 2010, NIPS.

[8]  Lihong Li,et al.  PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.

[9]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[10]  Sylvain Gelly,et al.  Exploration exploitation in Go: UCT for Monte-Carlo Go , 2006, NIPS 2006.

[11]  Maya Cakmak,et al.  Optimality of human teachers for robot learners , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[12]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[13]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[14]  Sridhar Mahadevan,et al.  Decision-Theoretic Planning with Concurrent Temporally Extended Actions , 2001, UAI.

[15]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[16]  Thomas J. Walsh,et al.  Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.

[17]  José María Valls,et al.  Correcting and improving imitation models of humans for Robosoccer agents , 2005, 2005 IEEE Congress on Evolutionary Computation.

[18]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[19]  Daniel H. Grollman,et al.  Dogged Learning for Robots , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[20]  Peng Zhou,et al.  Discovering options from example trajectories , 2009, ICML '09.

[21]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[22]  Yngvi Björnsson,et al.  Simulation-Based Approach to General Game Playing , 2008, AAAI.

[23]  Stuart J. Russell,et al.  Markovian State and Action Abstractions for MDPs via Hierarchical MCTS , 2016, IJCAI.

[24]  David A. Bell,et al.  Skill Combination for Reinforcement Learning , 2007, IDEAL.

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26]  Karen M. Feigh,et al.  Application of Abstraction Hierarchies to Incorporate Human Knowledge for Machine Learning , 2015 .

[27]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[28]  Karen M. Feigh,et al.  Discovery, Evaluation, and Exploration of Human Supplied Options and Constraints , 2015, AAMAS.

[29]  Shie Mannor,et al.  Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[30]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[31]  Olivier Teytaud,et al.  Adding Expert Knowledge and Exploration in Monte-Carlo Tree Search , 2009, ACG.