论文信息 - Reinforcement Learning for Weakly-Coupled MDPs and an Application to Planetary Rover Control

Reinforcement Learning for Weakly-Coupled MDPs and an Application to Planetary Rover Control

Weakly-coupled Markov decision processes can be decomposed into subprocesses that interact only through a small set of bottleneck states. We study a hierarchical reinforcement learning algorithm designed to take advantage of this particular type of decomposability. To test our algorithm, we use a decision-making problem faced by autonomous planetary rovers. In this problem, a Mars rover must decide which activities to perform and when to traverse between science sites in order to make the best use of its limited resources. In our experiments, the hierarchical algorithm performs better than Q-learning in the early stages of learning, but unlike Q-learning it converges to a suboptimal policy. This suggests that it may be advantageous to use the hierarchical algorithm when training time is limited.

Shlomo Zilberstein | Daniel S. Bernstein

[1] Ronald Parr,et al. Flexible Decomposition Algorithms for Weakly Coupled Markov Decision Problems , 1998, UAI.

[2] Thomas Dean,et al. Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.

[3] Doina Precup,et al. Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales , 1998 .

[4] P. Varaiya,et al. Multilayer control of large Markov chains , 1978 .

[6] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[7] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[8] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..