GSMDPs for Multi-Robot Sequential Decision-Making

Markov Decision Processes (MDPs) provide an extensive theoretical background for problems of decision-making under uncertainty. In order to maintain computational tractability, however, real-world problems are typically discretized in states and actions as well as in time. Assuming synchronous state transitions and actions at fixed rates may result in models which are not strictly Markovian, or where agents are forced to idle between actions, losing their ability to react to sudden changes in the environment. In this work, we explore the application of Generalized Semi-Markov Decision Processes (GSMDPs) to a realistic multi-robot scenario. A case study will be presented in the domain of cooperative robotics, where real-time reactivity must be preserved, and synchronous discrete-time approaches are therefore sub-optimal. This case study is tested on a team of real robots, and also in realistic simulation. By allowing asynchronous events to be modeled over continuous time, the GSMDP approach is shown to provide greater solution quality than its discrete-time counterparts, while still being approximately solvable by existing methods.

[1]  Nikos A. Vlassis,et al.  Decentralized planning under uncertainty for teams of communicating agents , 2006, AAMAS '06.

[2]  Nikos A. Vlassis,et al.  Multiagent Planning Under Uncertainty with Stochastic Communication Delays , 2008, ICAPS.

[3]  Håkan L. S. Younes Planning and Execution with Phase Transitions , 2005, AAAI.

[4]  Håkan L. S. Younes,et al.  Solving Generalized Semi-Markov Decision Processes Using Continuous Phase-Type Distributions , 2004, AAAI.

[5]  Francisco S. Melo,et al.  A Flexible Approach to Modeling Unpredictable Events in MDPs , 2013, ICAPS.

[6]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[7]  Matthijs T. J. Spaan,et al.  Multi-robot planning under uncertainty with communication: a case study , 2010 .

[8]  Susan Tiefenbrun,et al.  SAN JOSE (California) , 2012 .

[9]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[10]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[11]  D. J. White,et al.  A Survey of Applications of Markov Decision Processes , 1993 .

[12]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[13]  Pedro U. Lima,et al.  Efficient Offline Communication Policies for Factored Multiagent POMDPs , 2011, NIPS.

[14]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[15]  Claudia V. Goldman,et al.  Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[16]  Manuela M. Veloso,et al.  Exploiting factored representations for decentralized execution in multiagent teams , 2007, AAMAS '07.