Monte Carlo Tree Search Planning for Continuous Action and State Spaces (short paper)

Sequential decision-making in real-world environments is an important problem of artificial intelligence and robotics. In the last decade reinforcement learning has provided effective solutions in small and simulated environments but it has also shown some limits on large and real-world domains characterized by continuous state and action spaces. In this work, we aim to evaluate some state-of-the-art algorithms based on Monte Carlo Tree Search planning in continuous state/action spaces and propose a first version of a new algorithm based on action widening. Algorithms are evaluated on a synthetic domain in which the agent aims to control a car through a narrow curve for reaching the goal in the shortest possible time and avoiding the car going off the road. We show that the proposed method outperforms the state-of-the-art techniques.

[1]  A. Farinelli,et al.  Learning Logic Specifications for Soft Policy Guidance in POMCP , 2023, AAMAS.

[2]  A. Farinelli,et al.  Learning State-Variable Relationships in POMCP: A Framework for Mobile Robots , 2022, Future Internet.

[3]  Bradley Hayes,et al.  Intention-Aware Navigation in Crowds with Extended-Space POMDP Planning , 2022, AAMAS.

[4]  A. Farinelli,et al.  Learning state-variable relationships for improving POMCP performance , 2022, SAC.

[5]  M. H. Lim,et al.  Voronoi Progressive Widening: Efficient Online Solvers for Continuous State, Action, and Observation POMDPs , 2021, 2021 60th IEEE Conference on Decision and Control (CDC).

[6]  Alessandro Farinelli,et al.  Rule-based Shielding for Partially Observable Monte-Carlo Planning , 2021, ICAPS.

[7]  Alessandro Farinelli,et al.  Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach , 2020, AAMAS.

[8]  Alessandro Farinelli,et al.  Influence of State-Variable Constraints on Partially Observable Monte Carlo Planning , 2019, IJCAI.

[9]  Claire J. Tomlin,et al.  Online revenue maximization for server pricing , 2019, Autonomous Agents and Multi-Agent Systems.

[10]  Mykel J. Kochenderfer,et al.  Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces , 2017, ICAPS.

[11]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[12]  Dimitris Bertsimas,et al.  Dynamic resource allocation: A flexible and tractable modeling framework , 2014, Eur. J. Oper. Res..

[13]  Goldie Nejat,et al.  Robotic Urban Search and Rescue: A Survey from the Control Perspective , 2013, J. Intell. Robotic Syst..

[14]  Adrien Couëtoux Monte Carlo Tree Search for Continuous and Stochastic Sequential Decision Making Problems. (Monte Carlo Tree Search pour les problèmes de décision séquentielle en milieu continus et stochastiques) , 2013 .

[15]  Emilio Frazzoli,et al.  Intention-Aware Motion Planning , 2013, WAFR.

[16]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[17]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[18]  R. Munos,et al.  Algorithms for Infinitely Many-Armed Bandits , 2008, NIPS.

[19]  H. Jaap van den Herik,et al.  Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[20]  Pieter Spronck,et al.  Monte-Carlo Tree Search: A New Framework for Game AI , 2008, AIIDE.

[21]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[22]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[23]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[24]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[25]  R. Bellman A Markovian Decision Process , 1957 .

[26]  Alessandro Farinelli,et al.  Partially Observable Monte Carlo Planning with state variable constraints for mobile robot navigation , 2021, Eng. Appl. Artif. Intell..