论文信息 - Formation control using GQ(λ) reinforcement learning

Formation control using GQ(λ) reinforcement learning

Formation control is an important subtask for autonomous robots. From flying drones to swarm robotics, many applications need their agents to control their group behavior. Especially when moving autonomously in humanrobot teams, motion and formation control of a group of agents is a critical and challenging task. In this work, we propose a method of applying the GQ(λ) reinforcement learning algorithm to a leader-follower formation control scenario on the e-puck robot platform. In order to allow control via classical reinforcement learning, we present how we modeled a formation control problem as a Markov decision making process. This allows us to use the Greedy-GQ(λ) algorithm for learning a leader-follower control law. The applicability and performance of this control approach is investigated in simulation as well as on real robots. In both experiments, the followers are able to move behind the leader. Additionally, the algorithm improves the smoothness of the follower's path online, which is beneficial in the context of human-robot interaction.

[1] Francesco Mondada,et al. The e-puck, a Robot Designed for Education in Engineering , 2009 .

[2] William B. Dunbar,et al. Model predictive control of coordinated multi-vehicle formations , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[3] Lyuba Alboul,et al. Multi-robot team formation control in the GUARDIANS project , 2010, Ind. Robot.

[4] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[5] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.

[6] Yoshinori Kobayashi,et al. Robotic wheelchair moving with multiple companions , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[7] Francisco José Madrid-Cuevas,et al. Automatic generation and detection of highly reliable fiducial markers under occlusion , 2014, Pattern Recognit..

[8] Brian D. O. Anderson,et al. Control of Minimally Persistent Formations in the Plane , 2009, SIAM J. Control. Optim..

[9] William B. Dunbar,et al. Distributed receding horizon control for multi-vehicle formation stabilization , 2006, Autom..

[10] Florian Dörfler,et al. Geometric Analysis of the Formation Problem for Autonomous Robots , 2010, IEEE Transactions on Automatic Control.

[11] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .

[12] Sergio Monteiro,et al. A dynamical systems approach to behavior-based formation control , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[13] Surya P. N. Singh,et al. V-REP: A versatile and scalable robot simulation framework , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.

[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16] Masaru Uchiyama,et al. A symmetric hybrid position/force control scheme for the coordination of two robots , 1988, Proceedings. 1988 IEEE International Conference on Robotics and Automation.