Policy-contingent abstraction for robust robot control

This paper presents a scalable control algorithm that enables a deployed mobile robot to make high-level control decisions under full consideration of its probabilistic belief. We draw on insights from the rich literature of structured robot controllers and hierarchical MDPs to propose PolCA, a hierarchical probabilistic control algorithm which learns both subtask-specific state abstractions and policies. The resulting controller has been successfully implemented onboard a mobile robotic assistant deployed in a nursing facility. To the best of our knowledge, this work is a unique instance of applying POMDPs to highlevel robotic control problems.

[1]  Joelle Pineau,et al.  An integrated approach to hierarchy and abstraction for pomdps , 2002 .

[2]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[3]  Ronald C. Arkin,et al.  An Behavior-based Robotics , 1998 .

[4]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[5]  Balaraman Ravindran,et al.  Hierarchical Optimal Control of MDPs , 1998 .

[6]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[7]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[8]  Andrew G. Barto,et al.  Automated State Abstraction for Options using the U-Tree Algorithm , 2000, NIPS.

[9]  Malcolm R. K. Ryan Using Abstract Models of Behaviours to Automatically Generate Reinforcement Learning Hierarchies , 2002, ICML.

[10]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[11]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[12]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[13]  Andrew G. Barto,et al.  PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning , 2002, ICML.

[14]  Sebastian Thrun,et al.  Coastal Navigation with Mobile Robots , 1999, NIPS.

[15]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[16]  Joelle Pineau,et al.  Experiences with a mobile robotic guide for the elderly , 2002, AAAI/IAAI.

[17]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[18]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[19]  Robert Givan,et al.  Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.

[20]  Martha E. Pollack,et al.  Planning Technology for Intelligent Cognitive Orthotics , 2002, AIPS.