A survey of reinforcement learning research and its application for multi-robot systems

Reinforcement learning aims to obtain optimal/suboptimal strategy through trial-and-error and interaction with dynamic environment. After an introduction of basic knowledge of reinforcement learning, TD algorithm, Q-learning algorithm, Dyna algorithm and Sarsa algorithm base on Markov decision model are discussed, respectively. Moreover, reinforcement learning based on partially observable Markov decision process and semi-Markov decision model for uncertain environment are analyzed, respectively. The research status of Q learning in the field of multi-robot systems is also presented. Finally, the main challenges and further research work are given.

[1]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[2]  R. Arkin,et al.  Behavioral diversity in learning robot teams , 1998 .

[3]  Manuela Veloso,et al.  Tree based hierarchical reinforcement learning , 2002 .

[4]  Benjamin Kuipers,et al.  Qualitative and Quantitative Simulation: Bridging the Gap , 1997, Artif. Intell..

[5]  Zonghai Chen,et al.  Grey Reinforcement Learning for Incomplete Information Processing , 2006, TAMC.

[6]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[7]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[8]  Ronald A. Howard,et al.  Dynamic Probabilistic Systems , 1971 .

[9]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Sumit Mukhopadhyay,et al.  A Behavior-based Approach for Multi-agent Q-learning for Autonomous Exploration , 2011, ArXiv.

[12]  Sandip Sen,et al.  Learning to Coordinate without Sharing Information , 1994, AAAI.

[13]  Daoyi Dong,et al.  Hybrid Control for Robot Navigation - A Hierarchical Q-Learning Algorithm , 2008, IEEE Robotics & Automation Magazine.

[14]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[15]  Bernhard Hengst,et al.  Discovering hierarchy in reinforcement learning , 2003 .

[16]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[17]  Mengchun Xie Representation of the perceived environment and acquisition of behavior rule for multi-agent systems by Q-learning , 2000, 2009 4th International Conference on Autonomous Robots and Agents.

[18]  Bir Bhanu,et al.  Real-time robot learning , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[19]  K. Fu,et al.  A heuristic approach to reinforcement learning control systems , 1965 .

[20]  Pradeep K. Khosla,et al.  The necessity of average rewards in cooperative multirobot learning , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[21]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[22]  Margaret Mary Skelly,et al.  Hierarchical Reinforcement Learning with Function Approximation for Adaptive Control , 2004 .

[23]  Yunyi Jia,et al.  Coordinated formation control for multi-robot systems with communication constraints , 2011, 2011 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM).

[24]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[25]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .