Decision-Theoretic Control of Planetary Rovers

Planetary rovers are small unmanned vehicles equipped with cameras and a variety of sensors used for scientific experiments. They must operate under tight constraints over such resources as operation time, power, storage capacity, and communication bandwidth. Moreover, the limited computational resources of the rover limit the complexity of on-line planning and scheduling. We describe two decision-theoretic approaches to maximize the productivity of planetary rovers: one based on adaptive planning and the other on hierarchical reinforcement learning. Both approaches map the problem into a Markov decision problem and attempt to solve a large part of the problem off-line, exploiting the structure of the plan and independence between plan components. We examine the advantages and limitations of these techniques and their scalability.

[1]  Ronald Parr,et al.  Flexible Decomposition Algorithms for Weakly Coupled Markov Decision Problems , 1998, UAI.

[2]  David S. Wettergreen,et al.  Field Experiments with the Ames Marsokhod Rover , 1998 .

[3]  Alex Fukunaga,et al.  Towards an application framework for automated planning and scheduling , 1997, 1997 IEEE Aerospace Conference.

[4]  C. Cordell Green,et al.  Application of Theorem Proving to Problem Solving , 1969, IJCAI.

[5]  William Whittaker,et al.  The Atacama Desert Trek: outcomes , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[6]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7]  E. D. Smith,et al.  Increased Flexibility and Robustness of Mars Rovers , 1999 .

[8]  Reid G. Simmons,et al.  Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[9]  Earl D. Sacerdoti,et al.  Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[10]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[11]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[12]  Alexander G. Gray,et al.  An Integrated System for Multi-Rover Scientific Exploration , 1999, AAAI/IAAI.

[13]  P. Pandurang Nayak,et al.  Remote Agent: To Boldly Go Where No AI System Has Gone Before , 1998, Artif. Intell..

[14]  Michael L. Littman,et al.  Exact Solutions to Time-Dependent MDPs , 2000, NIPS.

[15]  Mark S. Boddy,et al.  An Analysis of Time-Dependent Planning , 1988, AAAI.

[16]  T.T. Nguyen,et al.  Experiences with operations and autonomy of the Mars Pathfinder Microrover , 1998, 1998 IEEE Aerospace Conference Proceedings (Cat. No.98TH8339).

[17]  P. Varaiya,et al.  Multilayer control of large Markov chains , 1978 .

[18]  Shlomo Zilberstein,et al.  LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..

[19]  Michael P. Wellman,et al.  Path Planning under Time-Dependent Uncertainty , 1995, UAI.

[20]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[21]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[22]  Shlomo Zilberstein,et al.  Planetary Rover Control as a Markov Decision Process , 2002 .

[23]  Richard Washington,et al.  Expected Utility Distributions for Flexible, Contingent Execution , 2000 .

[24]  Shlomo Zilberstein,et al.  Reinforcement Learning for Weakly-Coupled MDPs and an Application to Planetary Rover Control , 2001 .

[25]  Doina Precup,et al.  Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales , 1998 .

[26]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[27]  Shlomo Zilberstein,et al.  Adaptive Control of Acyclic Progressive Processing Task Structures , 2001, IJCAI.

[28]  Shlomo Zilberstein,et al.  Optimal Scheduling of Dynamic Progressive Processing , 1998, ECAI.

[30]  Shlomo Zilberstein,et al.  Reactive Control of Dynamic Progressive Processing , 1999, IJCAI.

[31]  Peter Norvig,et al.  Robustness via Run-Time Adaptation of Contingent Plans , 2000 .

[32]  Thomas Dean,et al.  Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.

[33]  Michael P. Wellman A Market-Oriented Programming Environment and its Application to Distributed Multicommodity Flow Problems , 1993, J. Artif. Intell. Res..

[34]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[35]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..