Learning to Select State Machines using Expert Advice on an Autonomous Robot

Hierarchical state machines have proven to be a powerful tool for controlling autonomous robots due to their flexibility and modularity. For most real robot implementations, however, it is often the case that the control hierarchy is hand-coded. As a result, the development process is often time intensive and error prone. In this paper, we explore the use of an experts learning approach, based on Auer and colleagues' Exp3 (1995), to help overcome some of these limitations. In particular, we develop a modified learning algorithm, which we call rExp3, that exploits the structure provided by a control hierarchy by treating each state machine as an 'expert'. Our experiments validate the performance of rExp3 on a real robot performing a task, and demonstrate that rExp3 is able to quickly learn to select the best state machine expert to execute. Through our investigations in these environments, we identify a need for faster learning recovery when the relative performances of experts reorder, such as in response to a discrete environment change. We introduce a modified learning rule to improve the recovery rate in these situations and demonstrate through simulation experiments that rExp3 performs as well or better than Exp3 under such conditions.

[1]  Brett Browning,et al.  Plays as Effective Multiagent Plans Enabling Opponent-Adaptive Play Selection , 2004, ICAPS.

[2]  Reid G. Simmons,et al.  A task description language for robot control , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[3]  Hoa G. Nguyen,et al.  Segway robotic mobility platform , 2004, SPIE Optics East.

[4]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[5]  Brett Browning,et al.  Development of a soccer-playing dynamically-balancing mobile robot , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[6]  Brett Browning,et al.  STP: Skills, tactics, and plays for multi-robot control in adversarial environments , 2005 .

[7]  Manuela M. Veloso,et al.  Convergence of Gradient Dynamics with a Variable Learning Rate , 2001, ICML.

[8]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[9]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[10]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[11]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[12]  Tucker R. Balch,et al.  Io, Ganymede, and Callisto A Multiagent Robot Trash-Collecting Team , 1995, AI Mag..

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.