Adapting an MDP planner to time-dependency: case study on a UAV coordination problem

In order to allow the temporal coordination of two independent communicating agents, one needs to be able to plan in a time-dependent environment. This paper deals with the modeling and solving of such problems through the use of Time-dependent Markov Decision Processes (TiMDPs). We provide an analysis of the TiMDP model and exploit its properties to introduce an improved asynchronous value iteration method. Our approach is evaluated on a UAV temporal coordination problem and on the well-known Mars rover domain.

[1]  Zhengzhu Feng,et al.  Dynamic Programming for Structured Continuous Markov Decision Problems , 2004, UAI.

[2]  Andrew W. Moore,et al.  Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[3]  C. Atkeson,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[4]  Subbarao Kambhampati,et al.  When is Temporal Planning Really Temporal? , 2007, IJCAI.

[5]  Håkan L. S. Younes,et al.  Solving Generalized Semi-Markov Decision Processes Using Continuous Phase-Type Distributions , 2004, AAAI.

[6]  David E. Smith,et al.  Planning Under Continuous Time and Resource Uncertainty: A Challenge for AI , 2002, AIPS Workshop on Planning for Temporal Domains.

[7]  Patrick Fabiani,et al.  Extending the Bellman equation for MDPs to continuous actions and cont. time in the discounted case , 2008, ISAIM.

[8]  Emmanuel Rachelson,et al.  Temporal Markov Decision Problems : Formalization and Resolution , 2009 .

[9]  Michael P. Wellman,et al.  Path Planning under Time-Dependent Uncertainty , 1995, UAI.

[10]  Cédric Pralet,et al.  Using Constraint Networks on Timelines to Model and Solve Planning and Scheduling Problems , 2008, ICAPS.

[11]  Michael L. Littman,et al.  Exact Solutions to Time-Dependent MDPs , 2000, NIPS.

[12]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[13]  Lihong Li,et al.  Lazy Approximation for Solving Continuous Finite-Horizon MDPs , 2005, AAAI.

[14]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.