HIGH-LEVEL CONTROL OF AUTONOMOUS ROBOTS USING A BEHAVIOR-BASED SCHEME AND REINFORCEMENT LEARNING

Abstract This paper proposes a behavior-based scheme for high-level control of autonomous robots. Two main characteristics can be highlighted in the control scheme. Behavior coordination is done through a hybrid methodology, which takes in advantages of the robustness and modularity in competitive approaches, as well as optimized trajectories in cooperative ones. As a second feature, behavior state/action mapping is learnt by means of Reinforcement Learning (RL). A new continuous approach of the Q_learning algorithm, implemented with a multi-layer neural network, is used. The behavior-based scheme attempts to fulfill simple missions in which several behaviors/tasks compete for the vehicle's control. This paper is centered in the RL-based behaviors. In order to test the feasibility of the proposed Neural-Q_learning scheme, real experiments with the underwater robot ODIN in a target following behavior were done. Results showed the convergence of the behavior into an optimal state/action mapping. Discussion about the proposed approach is given, as well as an overall description of the high level control scheme.

[1]  Sridhar Mahadevan,et al.  Hierarchical Memory-Based Reinforcement Learning , 2000, NIPS.

[2]  Leslie Pack Kaelbling,et al.  Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[3]  Ronald C. Arkin,et al.  An Behavior-based Robotics , 1998 .

[4]  Minoru Asada,et al.  Continuous valued Q-learning for vision-guided behavior acquisition , 1999, Proceedings. 1999 IEEE/SICE/RSJ. International Conference on Multisensor Fusion and Integration for Intelligent Systems. MFI'99 (Cat. No.99TH8480).

[5]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[6]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[7]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[8]  Alexander Zelinsky,et al.  Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.

[9]  Junku Yuh,et al.  Development of the Omni Directional Intelligent Navigator , 1995, IEEE Robotics Autom. Mag..

[10]  Tsukasa Ogasawara,et al.  Continuous valued Q-learning method able to incrementally refine state space , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[11]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[12]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[13]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[14]  Joan Batlle,et al.  Hybrid coordination of reinforcement learning-based behaviors for AUV control , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).