Combining Offline Models and Online Monte-Carlo Tree Search for Planning from Scratch

Planning in stochastic and partially observable environments is a central issue in artificial intelligence. One commonly used technique for solving such a problem is by constructing an accurate model firstly. Although some recent approaches have been proposed for learning optimal behaviour under model uncertainty, prior knowledge about the environment is still needed to guarantee the performance of the proposed algorithms. With the benefits of the Predictive State Representations~(PSRs) approach for state representation and model prediction, in this paper, we introduce an approach for planning from scratch, where an offline PSR model is firstly learned and then combined with online Monte-Carlo tree search for planning with model uncertainty. By comparing with the state-of-the-art approach of planning with model uncertainty, we demonstrated the effectiveness of the proposed approaches along with the proof of their convergence. The effectiveness and scalability of our proposed approach are also tested on the RockSample problem, which are infeasible for the state-of-the-art BA-POMDP based approaches.

[1]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[2]  Joelle Pineau,et al.  Efficient learning and planning with compressed predictive states , 2013, J. Mach. Learn. Res..

[3]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[4]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[5]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[6]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[7]  Peter Dayan,et al.  Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search , 2013, J. Artif. Intell. Res..

[8]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[9]  Joelle Pineau,et al.  Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[10]  Dean Alderucci A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .

[11]  Yunlong Liu,et al.  Predictive State Representations with State Space Partitioning , 2015, AAMAS.

[12]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[13]  Ariadna Quattoni,et al.  Spectral learning of weighted automata , 2014, Machine Learning.

[14]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[15]  Michèle Sebag,et al.  The grand challenge of computer Go , 2012, Commun. ACM.

[16]  Joelle Pineau,et al.  Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..

[17]  Joelle Pineau,et al.  Theoretical Analysis of Heuristic Search Methods for Online POMDPs , 2007, NIPS.

[18]  Frans A. Oliehoek,et al.  Bayesian Reinforcement Learning in Factored POMDPs , 2018, AAMAS.

[19]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[20]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[21]  David Hsu,et al.  DESPOT: Online POMDP Planning with Regularization , 2013, NIPS.

[22]  Frans A. Oliehoek,et al.  Learning in POMDPs with Monte Carlo Tree Search , 2017, ICML.

[23]  Erik Talvitie,et al.  Learning to Make Predictions In Partially Observable Environments Without a Generative Model , 2011, J. Artif. Intell. Res..

[24]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[25]  Yunlong Liu,et al.  Solving partially observable problems with inaccurate PSR models , 2014, Inf. Sci..

[26]  Yunlong Liu,et al.  Learning Predictive State Representations via Monte-Carlo Tree Search , 2016, IJCAI.

[27]  Yunlong Liu,et al.  Basis selection in spectral learning of predictive state representations , 2018, Neurocomputing.

[28]  Shie Mannor,et al.  Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..

[29]  Joelle Pineau,et al.  Bayes-Adaptive POMDPs , 2007, NIPS.

[30]  Joelle Pineau,et al.  A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes , 2011, J. Mach. Learn. Res..

[31]  David Silver,et al.  Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..

[32]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[33]  Andrew G. Barto,et al.  Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .

[34]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.