Solving Partially Observable Stochastic Shortest-Path Games

We study the two-player zero-sum extension of the partially observable stochastic shortest-path problem where one agent has only partial information about the environment. We formulate this problem as a partially observable stochastic game (POSG): given a set of target states and negative rewards for each transition, the player with imperfect information maximizes the expected undiscounted total reward until a target state is reached. The second player with the perfect information aims for the opposite. We base our formalism on POSGs with one-sided observability (OS-POSGs) and give the following contributions: (1) we introduce a novel heuristic search value iteration algorithm that iteratively solves depth-limited variants of the game, (2) we derive the bound on the depth guaranteeing an arbitrary precision, (3) we propose a novel upper-bound estimation that allows early terminations, and (4) we experimentally evaluate the algorithm on a pursuit-evasion game.

[1]  Costas S. Iliopoulos,et al.  Formal Aspects of Computing , 2013 .

[2]  George A. Bekey,et al.  On autonomous robots , 1998, The Knowledge Engineering Review.

[3]  D. Bertsekas,et al.  Stochastic Shortest Path Games , 1999 .

[4]  S. Crawford,et al.  Volume 1 , 2012, Journal of Diabetes Investigation.

[5]  Reid G. Simmons,et al.  Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[6]  Branislav Bosanský,et al.  Heuristic Search Value Iteration for One-Sided Partially Observable Stochastic Games , 2017, AAAI.

[7]  Christopher Kiekintveld,et al.  Solving Zero-Sum One-Sided Partially Observable Stochastic Games , 2020, Artif. Intell..

[8]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[9]  Qing Zhao,et al.  Transmission Scheduling for Optimizing Sensor Network Lifetime: A Stochastic Shortest Path Approach , 2007, IEEE Transactions on Signal Processing.

[10]  Yoko Watanabe,et al.  Solving path planning problems in urban environments based on a priori sensor availability and execution error propagation , 2019 .

[11]  Daniela Rus,et al.  Practical Route Planning Under Delay Uncertainty: Stochastic Shortest Path Queries , 2012, Robotics: Science and Systems.

[12]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[13]  J. van Leeuwen,et al.  Theoretical Computer Science , 2003, Lecture Notes in Computer Science.

[14]  R. Jewett,et al.  Systems Engineering , 1959, IRE Transactions on Military Electronics.

[15]  András György,et al.  The adversarial stochastic shortest path problem with unknown transition probabilities , 2012, AISTATS.

[16]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[17]  Geoffrey A. Hollinger,et al.  Search and pursuit-evasion in mobile robotics , 2011, Auton. Robots.

[18]  Yishay Mansour,et al.  Stochastic Shortest Path with Adversarially Changing Costs , 2021, IJCAI.

[19]  Branislav Bosanský,et al.  Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs , 2018, IJCAI.

[20]  Haipeng Luo,et al.  Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition , 2020, Annual Conference Computational Learning Theory.

[21]  Sandeep K. Shukla,et al.  Using probabilistic model checking for dynamic power management , 2005, Formal Aspects of Computing.

[22]  Shlomo Zilberstein,et al.  Planning in Stochastic Environments with Goal Uncertainty , 2018, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  S. Patek On partially observed stochastic shortest path problems , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[24]  Robert M Thrall,et al.  Mathematics of Operations Research. , 1978 .

[25]  岡崎 孝太郎 30th AAAI Conference on Artificial Intelligence (AAAI-16) , 2016 .

[26]  Bin Yu,et al.  Artificial intelligence and statistics , 2018, Frontiers of Information Technology & Electronic Engineering.