论文信息 - Solving Concurrent Markov Decision Processes

Solving Concurrent Markov Decision Processes

Typically, Markov decision problems (MDPs) assume a single action is executed per decision epoch, but in the real world one may frequently execute certain actions in parallel. This paper explores concurrent MDPs, MDPs which allow multiple non-conflicting actions to be executed simultaneously, and presents two new algorithms. Our first approach exploits two provably sound pruning rules, and thus guarantees solution optimality. Our second technique is a fast, sampling-based algorithm, which produces c1ose-to-optimal solutions extremely quickly. Experiments show that our approaches outperform the existing algorithms producing up to two orders of magnitude speedup.

Mausam | Daniel S. Weld

[1] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.

[2] Shlomo Zilberstein,et al. LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..

[3] David E. Smith,et al. Incremental Contingency Planning , 2003 .

[4] Kee-Eung Kim,et al. Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[5] Satinder P. Singh,et al. How to Dynamically Merge Markov Decision Processes , 1997, NIPS.

[6] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[7] Håkan L. S. Younes,et al. Policy Generation for Continuous-time Stochastic Domains with Concurrency , 2004, ICAPS.

[8] Avrim Blum,et al. Fast Planning Through Planning Graph Analysis , 1995, IJCAI.

[9] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[10] Stefan Edelkamp,et al. Taming Numbers and Durations in the Model Checking Integrated Planning System , 2003, PuK.

[11] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[12] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[13] David E. Smith,et al. Planning Under Continuous Time and Resource Uncertainty: A Challenge for AI , 2002, AIPS Workshop on Planning for Temporal Domains.

[14] Sridhar Mahadevan,et al. Decision-Theoretic Planning with Concurrent Temporally Extended Actions , 2001, UAI.

[15] Subbarao Kambhampati,et al. AltAltp: Online Parallelization of Plans with Heuristic State Search , 2003, J. Artif. Intell. Res..

[16] Blai Bonet,et al. Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.