Optimal Planning Over Long and Infinite Horizons for Achieving Independent Partially-Observable Tasks That Evolve Over Time

We focus on long-sighted planning for a class of problems with multiple independent tasks that are partially observable and evolve over time. An example problem that falls into this class is a robot waiting multiple tables, referred to as tasks, in a restaurant where customers’ satisfaction is partially observable and evolves over time. Our recent work exploits the structure found in these problems, namely the independence between the tasks, to optimally and efficiently plan for a short fixed planning horizon. Selecting the right planning horizon can be challenging since an overly short horizon may result in a low-quality solution while supporting a longer horizon quickly becomes computationally impractical. In this paper, we address this challenge. In particular, we extend the recent work to provide efficient planning over long fixed-length horizons without discounting and infinite-length horizons with discounting. The key idea we exploit to achieve efficiency is to compute lower and upper-bounds on the value of an optimal solution for variable horizons which allow us to terminate the search early while guaranteeing optimality. We present the algorithm, analyze its theoretical properties, and demonstrate its efficiency on the waiting tables domain.

[1]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[2]  Mykel J. Kochenderfer,et al.  Improving Offline Value-Function Approximations for POMDPs by Reducing Discount Factors , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  David Hsu,et al.  Motion planning under uncertainty for robotic tasks with long time horizons , 2010, Int. J. Robotics Res..

[4]  Nan Jiang,et al.  On Structural Properties of MDPs that Bound Loss Due to Shallow Planning , 2016, IJCAI.

[5]  P. Whittle Restless Bandits: Activity Allocation in a Changing World , 1988 .

[6]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[7]  Manuela Veloso,et al.  Efficient Robot Planning for Achieving Multiple Independent Partially Observable Tasks That Evolve over Time , 2020, ICAPS.

[8]  Manuela M. Veloso,et al.  Short-Sighted Stochastic Shortest Path Problems , 2012, ICAPS.

[9]  A. Cassandra A Survey of POMDP Applications , 2003 .

[10]  Guy Shani,et al.  Efficient ADD Operations for Point-Based Algorithms , 2008, ICAPS.

[11]  Guy Shani Task-Based Decomposition of Factored POMDPs , 2014, IEEE Transactions on Cybernetics.

[12]  Stéphane Ross,et al.  Hybrid POMDP Algorithms , 2006 .

[13]  Guy Shani,et al.  Noname manuscript No. (will be inserted by the editor) A Survey of Point-Based POMDP Solvers , 2022 .

[14]  David Hsu,et al.  DESPOT: Online POMDP Planning with Regularization , 2013, NIPS.

[15]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[16]  Romain Laroche,et al.  On Value Function Representation of Long Horizon Problems , 2018, AAAI.

[17]  Leslie Pack Kaelbling,et al.  Approximate Planning in POMDPs with Macro-Actions , 2003, NIPS.

[18]  Nan Jiang,et al.  The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.

[19]  Geoffrey J. Gordon,et al.  Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[20]  Daniel Nikovski,et al.  Truncated Approximate Dynamic Programming with Task-Dependent Terminal Value , 2016, AAAI.

[21]  Manuela Veloso,et al.  Waiting Tables as a Robot Planning Problem , 2021, ArXiv.

[22]  Joelle Pineau,et al.  Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..