论文信息 - The Robot Routing Problem for Collecting Aggregate Stochastic Rewards

The Robot Routing Problem for Collecting Aggregate Stochastic Rewards

We propose a new model for formalizing reward collection problems on graphs with dynamically generated rewards which may appear and disappear based on a stochastic model. The robot routing problem is modeled as a graph whose nodes are stochastic processes generating potential rewards over discrete time. The rewards are generated according to the stochastic process, but at each step, an existing reward disappears with a given probability. The edges in the graph encode the (unit-distance) paths between the rewards' locations. On visiting a node, the robot collects the accumulated reward at the node at that time, but traveling between the nodes takes time. The optimization question asks to compute an optimal (or epsilon-optimal) path that maximizes the expected collected rewards. We consider the finite and infinite-horizon robot routing problems. For finite-horizon, the goal is to maximize the total expected reward, while for infinite horizon we consider limit-average objectives. We study the computational and strategy complexity of these problems, establish NP-lower bounds and show that optimal strategies require memory in general. We also provide an algorithm for computing epsilon-optimal infinite paths for arbitrary epsilon > 0.

Rupak Majumdar | Sadegh Esmaeil Zadeh Soudjani | Rayna Dimitrova | Vinayak S. Prabhu | Ivan Gavran

[1] Nicholas R. Jennings,et al. Near-optimal continuous patrolling with teams of mobile information gathering agents , 2013, Artif. Intell..

[2] Krishnendu Chatterjee,et al. Quantitative Temporal Simulation and Refinement Distances for Timed Systems , 2015, IEEE Transactions on Automatic Control.

[3] Adam Meyerson,et al. Approximation algorithms for deadline-TSP and vehicle routing with time-windows , 2004, STOC '04.

[4] Dimitris Bertsimas,et al. A Stochastic and Dynamic Vehicle Routing Problem in the Euclidean Plane , 1991, Oper. Res..

[5] Tomás Brázdil,et al. Strategy Synthesis in Adversarial Patrolling Games , 2015, ArXiv.

[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7] Sven Koenig,et al. Multi-robot routing with rewards and disjoint time windows , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8] Emilio Frazzoli,et al. Dynamic Vehicle Routing for Robotic Systems , 2011, Proceedings of the IEEE.

[9] David R. Karger,et al. Approximation algorithms for orienteering and discounted-reward TSP , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[10] Calin Belta,et al. Incremental synthesis of control policies for heterogeneous multi-agent systems with linear temporal logic specifications , 2013, 2013 IEEE International Conference on Robotics and Automation.

[11] Ulrich Pferschy,et al. On the Shortest Path Game , 2017, Discret. Appl. Math..

[12] Patricia Bouyer,et al. Bounding Average-energy Games , 2016, FoSSaCS.

[13] Dirk Van Oudheusden,et al. The orienteering problem: A survey , 2011, Eur. J. Oper. Res..

[14] Yuri Gurevich,et al. Trees, automata, and games , 1982, STOC '82.

[15] Uri Zwick,et al. The Complexity of Mean Payoff Games on Graphs , 1996, Theor. Comput. Sci..

[16] Wolfgang Thomas,et al. On the Synthesis of Strategies in Infinite Games , 1995, STACS.

[17] Dan Alistarh,et al. Dynamic Task Allocation , 2013 .

[18] Robert E. Tarjan,et al. Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[19] Richard M. Karp,et al. A characterization of the minimum cycle mean in a digraph , 1978, Discret. Math..

[20] Uri Zwick,et al. The Complexity of Mean Payoff Games , 1995, COCOON.

[21] Rupak Majumdar,et al. Controller Synthesis for Reward Collecting Markov Processes in Continuous Space , 2017, HSCC.

[22] Satoshi Hoshino,et al. Adaptive patrolling by mobile robot for changing visitor trends , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23] Alessandro Abate,et al. Adaptive and Sequential Gridding Procedures for the Abstraction and Verification of Stochastic Processes , 2013, SIAM J. Appl. Dyn. Syst..

[24] Steven Okamoto,et al. Dynamic Multi-Agent Task Allocation with Spatial and Temporal Constraints , 2014, AAAI.

[25] Koen V. Hindriks,et al. Dynamic task allocation for multi-robot search and retrieval tasks , 2016, Applied Intelligence.

[26] Alexander H. G. Rinnooy Kan,et al. Vehicle Routing with Time Windows , 1987, Oper. Res..

[27] Ali Ekici,et al. Multiple agents maximum collection problem with time dependent rewards , 2013, Comput. Ind. Eng..

[28] Sven Koenig,et al. Multi-robot routing with linear decreasing rewards over time , 2009, 2009 IEEE International Conference on Robotics and Automation.