论文信息 - Adapting an MDP planner to time-dependency: case study on a UAV coordination problem

Adapting an MDP planner to time-dependency: case study on a UAV coordination problem

In order to allow the temporal coordination of two independent communicating agents, one needs to be able to plan in a time-dependent environment. This paper deals with the modeling and solving of such problems through the use of Time-dependent Markov Decision Processes (TiMDPs). We provide an analysis of the TiMDP model and exploit its properties to introduce an improved asynchronous value iteration method. Our approach is evaluated on a UAV temporal coordination problem and on the well-known Mars rover domain.

Patrick Fabiani | Frédérick Garcia | Emmanuel Rachelson

[1] Zhengzhu Feng,et al. Dynamic Programming for Structured Continuous Markov Decision Problems , 2004, UAI.

[2] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[3] C. Atkeson,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[4] Subbarao Kambhampati,et al. When is Temporal Planning Really Temporal? , 2007, IJCAI.

[5] Håkan L. S. Younes,et al. Solving Generalized Semi-Markov Decision Processes Using Continuous Phase-Type Distributions , 2004, AAAI.

[6] David E. Smith,et al. Planning Under Continuous Time and Resource Uncertainty: A Challenge for AI , 2002, AIPS Workshop on Planning for Temporal Domains.

[7] Patrick Fabiani,et al. Extending the Bellman equation for MDPs to continuous actions and cont. time in the discounted case , 2008, ISAIM.

[8] Emmanuel Rachelson,et al. Temporal Markov Decision Problems : Formalization and Resolution , 2009 .

[9] Michael P. Wellman,et al. Path Planning under Time-Dependent Uncertainty , 1995, UAI.

[10] Cédric Pralet,et al. Using Constraint Networks on Timelines to Model and Solve Planning and Scheduling Problems , 2008, ICAPS.

[11] Michael L. Littman,et al. Exact Solutions to Time-Dependent MDPs , 2000, NIPS.

[12] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[13] Lihong Li,et al. Lazy Approximation for Solving Continuous Finite-Horizon MDPs , 2005, AAAI.

[14] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.