Onboard Adaptive Learning for Planetary Surface Rover Control in Rough Terrain

Current and future NASA robotic missions to planetary surfaces are tending toward longer duration and are becoming more ambitious for rough terrain access. For a higher level of autonomy in such missions, the rovers will require behavior that must also adapt to declining rover health and unknown environmental conditions. The MER (Mars Exploration Rovers) called Spirit and Opportunity have both passed 350 days of life on the Martian surface, with possible extensions to 450 days and beyond depending on rover health. Changes in navigational planning due to degradation of the drive motors as they reach their lifetime are currently done on Earth for the Spirit rover. The upcoming 2009 MSL (Mars Science Laboratory) and 2013 AFL (Astrobiology Field Laboratory) missions are planned to last 300-500 days, and will possibly involve traverses on the order of multiple kilometers over challenging terrain. This paper presents an adaptive control algorithm for onboard learning of weights within a free flow hierarchy (FFH) behavior framework for autonomous control of planetary surface rovers that explicitly addresses the issues of rover health and rough terrain access. We also present the results of some laboratory and field studies.

[1]  Terrance L. Huntsberger,et al.  BISMARC: a biologically inspired system for map-based autonomous rover control , 1998, Neural Networks.

[2]  Andrew McCallum,et al.  Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[3]  Minoru Asada,et al.  Continuous valued Q-learning for vision-guided behavior acquisition , 1999, Proceedings. 1999 IEEE/SICE/RSJ. International Conference on Multisensor Fusion and Integration for Intelligent Systems. MFI'99 (Cat. No.99TH8480).

[4]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[5]  Paolo Pirjanian Multiple objective behavior-based control , 2000, Robotics Auton. Syst..

[6]  Edward Tunstel Ethology as an Inspiration for Adaptive Behavior Synthesis in Autonomous Planetary Rovers , 2001, Auton. Robots.

[7]  Joanna J. Bryson,et al.  Hierarchy and Sequence vs. Full Parallelism in Action Selection , 2000 .

[8]  Il Hong Suh,et al.  A novel dynamic priority-based action-selection-mechanism integrating a reinforcement learning , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[9]  Il Hong Suh,et al.  Design and implementation of a behavior-based control and learning architecture for mobile robots , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[10]  William Rowan,et al.  The Study of Instinct , 1953 .

[11]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[12]  Toby Tyrell,et al.  The use of hierarchies for action selection , 1993 .

[13]  P. Maclean A Triune Concept of the Brain and Behavior , 1973 .

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  P. Maclean,et al.  A triune concept of the brain and behaviour . Including Psychology of memory and Sleep and dreaming : papers presented at Queen's University, Kingston, Ontario, February 1969 , 1973 .

[16]  Maja J. Mataric,et al.  Integration of representation into goal-driven behavior-based robots , 1992, IEEE Trans. Robotics Autom..

[17]  Pattie Maes,et al.  Designing autonomous agents: Theory and practice from biology to engineering and back , 1990, Robotics Auton. Syst..

[18]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[19]  Terrance L. Huntsberger,et al.  Biologically Inspired Autonomous Rover Control , 2001, Auton. Robots.

[20]  François Michaud,et al.  Learning from History for Behavior-Based Mobile Robots in Non-Stationary Conditions , 1998, Machine Learning.

[21]  Paul S. Schenker,et al.  Improved Rover State Estimation in Challenging Terrain , 1999, Auton. Robots.

[22]  Maja J. Matari,et al.  Behavior-based Control: Examples from Navigation, Learning, and Group Behavior , 1997 .

[23]  Alexander Zelinsky,et al.  Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.

[24]  Osamu Katai,et al.  Q-PSP learning: an exploitation-oriented Q-learning algorithm and its applications , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[25]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[26]  Steven Dubowsky,et al.  Rovers for intelligent, agile traverse of challenging terrain , 2003 .

[27]  Maja J. Matarić,et al.  Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[28]  Paul S. Schenker,et al.  CAMPOUT: a control architecture for tightly coupled coordination of multirobot systems for planetary surface exploration , 2003, IEEE Trans. Syst. Man Cybern. Part A.

[29]  Hrand Aghazarian,et al.  Learning to behave: adaptive behavior for planetary surface rovers , 2004 .

[30]  Toby Tyrrell The Use of Hierarchies for Action Selection , 1993, Adapt. Behav..

[31]  Terrance L. Huntsberger,et al.  Autonomous multirover system for complex planetary surface retrieval operations , 1997, Other Conferences.

[32]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[33]  François Michaud,et al.  Representation of behavioral history for learning in nonstationary conditions , 1999, Robotics Auton. Syst..

[34]  Maja J. Mataric,et al.  Action selection within the context of a robotic colony , 1999, Optics East.

[35]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[36]  Gary G Yen,et al.  Reinforcement learning algorithms for robotic navigation in dynamic environments , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[37]  Mark Humphreys,et al.  Action selection methods using reinforcement learning , 1997 .

[38]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[39]  Terrance L. Huntsberger,et al.  Fault-tolerant action selection for planetary rover control , 1998, Other Conferences.

[40]  Gary G. Yen,et al.  Coordination of exploration and exploitation in a dynamic environment , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[41]  H. Evans The Study of Instinct , 1952 .

[42]  Edward Tunstel,et al.  Planetary Rover Developments Supporting Mars Exploration, Sample Return and Future Human-Robotic Colonization , 2003, Auton. Robots.

[43]  Steven Dubowsky,et al.  Mobile robot kinematic reconfigurability for rough terrain , 2000, SPIE Optics East.

[44]  Viii Supervisor Sonar-Based Real-World Mapping and Navigation , 2001 .

[45]  Maja J. Mataric,et al.  Behaviour-based control: examples from navigation, learning, and group behaviour , 1997, J. Exp. Theor. Artif. Intell..

[46]  Minoru Asada,et al.  Enhanced continuous valued Q-learning for real autonomous robots , 2000, Adv. Robotics.

[47]  J. K. Rosenblatt,et al.  A fine-grained alternative to the subsumption architecture for mobile robot control , 1989, International 1989 Joint Conference on Neural Networks.

[48]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[49]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[50]  Steven Dubowsky,et al.  Control of Robotic Vehicles with Actively Articulated Suspensions in Rough Terrain , 2003, Auton. Robots.