The Robot Routing Problem for Collecting Aggregate Stochastic Rewards

We propose a new model for formalizing reward collection problems on graphs with dynamically generated rewards which may appear and disappear based on a stochastic model. The robot routing problem is modeled as a graph whose nodes are stochastic processes generating potential rewards over discrete time. The rewards are generated according to the stochastic process, but at each step, an existing reward disappears with a given probability. The edges in the graph encode the (unit-distance) paths between the rewards' locations. On visiting a node, the robot collects the accumulated reward at the node at that time, but traveling between the nodes takes time. The optimization question asks to compute an optimal (or epsilon-optimal) path that maximizes the expected collected rewards. We consider the finite and infinite-horizon robot routing problems. For finite-horizon, the goal is to maximize the total expected reward, while for infinite horizon we consider limit-average objectives. We study the computational and strategy complexity of these problems, establish NP-lower bounds and show that optimal strategies require memory in general. We also provide an algorithm for computing epsilon-optimal infinite paths for arbitrary epsilon > 0.

[1]  Nicholas R. Jennings,et al.  Near-optimal continuous patrolling with teams of mobile information gathering agents , 2013, Artif. Intell..

[2]  Krishnendu Chatterjee,et al.  Quantitative Temporal Simulation and Refinement Distances for Timed Systems , 2015, IEEE Transactions on Automatic Control.

[3]  Adam Meyerson,et al.  Approximation algorithms for deadline-TSP and vehicle routing with time-windows , 2004, STOC '04.

[4]  Dimitris Bertsimas,et al.  A Stochastic and Dynamic Vehicle Routing Problem in the Euclidean Plane , 1991, Oper. Res..

[5]  Tomás Brázdil,et al.  Strategy Synthesis in Adversarial Patrolling Games , 2015, ArXiv.

[6]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7]  Sven Koenig,et al.  Multi-robot routing with rewards and disjoint time windows , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Emilio Frazzoli,et al.  Dynamic Vehicle Routing for Robotic Systems , 2011, Proceedings of the IEEE.

[9]  David R. Karger,et al.  Approximation algorithms for orienteering and discounted-reward TSP , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[10]  Calin Belta,et al.  Incremental synthesis of control policies for heterogeneous multi-agent systems with linear temporal logic specifications , 2013, 2013 IEEE International Conference on Robotics and Automation.

[11]  Ulrich Pferschy,et al.  On the Shortest Path Game , 2017, Discret. Appl. Math..

[12]  Patricia Bouyer,et al.  Bounding Average-energy Games , 2016, FoSSaCS.

[13]  Dirk Van Oudheusden,et al.  The orienteering problem: A survey , 2011, Eur. J. Oper. Res..

[14]  Yuri Gurevich,et al.  Trees, automata, and games , 1982, STOC '82.

[15]  Uri Zwick,et al.  The Complexity of Mean Payoff Games on Graphs , 1996, Theor. Comput. Sci..

[16]  Wolfgang Thomas,et al.  On the Synthesis of Strategies in Infinite Games , 1995, STACS.

[17]  Dan Alistarh,et al.  Dynamic Task Allocation , 2013 .

[18]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[19]  Richard M. Karp,et al.  A characterization of the minimum cycle mean in a digraph , 1978, Discret. Math..

[20]  Uri Zwick,et al.  The Complexity of Mean Payoff Games , 1995, COCOON.

[21]  Rupak Majumdar,et al.  Controller Synthesis for Reward Collecting Markov Processes in Continuous Space , 2017, HSCC.

[22]  Satoshi Hoshino,et al.  Adaptive patrolling by mobile robot for changing visitor trends , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Alessandro Abate,et al.  Adaptive and Sequential Gridding Procedures for the Abstraction and Verification of Stochastic Processes , 2013, SIAM J. Appl. Dyn. Syst..

[24]  Steven Okamoto,et al.  Dynamic Multi-Agent Task Allocation with Spatial and Temporal Constraints , 2014, AAAI.

[25]  Koen V. Hindriks,et al.  Dynamic task allocation for multi-robot search and retrieval tasks , 2016, Applied Intelligence.

[26]  Alexander H. G. Rinnooy Kan,et al.  Vehicle Routing with Time Windows , 1987, Oper. Res..

[27]  Ali Ekici,et al.  Multiple agents maximum collection problem with time dependent rewards , 2013, Comput. Ind. Eng..

[28]  Sven Koenig,et al.  Multi-robot routing with linear decreasing rewards over time , 2009, 2009 IEEE International Conference on Robotics and Automation.