论文信息 - A survey of reinforcement learning research and its application for multi-robot systems

A survey of reinforcement learning research and its application for multi-robot systems

Reinforcement learning aims to obtain optimal/suboptimal strategy through trial-and-error and interaction with dynamic environment. After an introduction of basic knowledge of reinforcement learning, TD algorithm, Q-learning algorithm, Dyna algorithm and Sarsa algorithm base on Markov decision model are discussed, respectively. Moreover, reinforcement learning based on partially observable Markov decision process and semi-Markov decision model for uncertain environment are analyzed, respectively. The research status of Q learning in the field of multi-robot systems is also presented. Finally, the main challenges and further research work are given.

[1] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[2] R. Arkin,et al. Behavioral diversity in learning robot teams , 1998 .

[3] Manuela Veloso,et al. Tree based hierarchical reinforcement learning , 2002 .

[4] Benjamin Kuipers,et al. Qualitative and Quantitative Simulation: Bridging the Gap , 1997, Artif. Intell..

[5] Zonghai Chen,et al. Grey Reinforcement Learning for Incomplete Information Processing , 2006, TAMC.

[6] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[7] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .

[8] Ronald A. Howard,et al. Dynamic Probabilistic Systems , 1971 .

[9] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11] Sumit Mukhopadhyay,et al. A Behavior-based Approach for Multi-agent Q-learning for Autonomous Exploration , 2011, ArXiv.

[12] Sandip Sen,et al. Learning to Coordinate without Sharing Information , 1994, AAAI.

[13] Daoyi Dong,et al. Hybrid Control for Robot Navigation - A Hierarchical Q-Learning Algorithm , 2008, IEEE Robotics & Automation Magazine.

[14] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[15] Bernhard Hengst,et al. Discovering hierarchy in reinforcement learning , 2003 .

[16] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[17] Mengchun Xie. Representation of the perceived environment and acquisition of behavior rule for multi-agent systems by Q-learning , 2000, 2009 4th International Conference on Autonomous Robots and Agents.

[18] Bir Bhanu,et al. Real-time robot learning , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[19] K. Fu,et al. A heuristic approach to reinforcement learning control systems , 1965 .

[20] Pradeep K. Khosla,et al. The necessity of average rewards in cooperative multirobot learning , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[21] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[22] Margaret Mary Skelly,et al. Hierarchical Reinforcement Learning with Function Approximation for Adaptive Control , 2004 .

[23] Yunyi Jia,et al. Coordinated formation control for multi-robot systems with communication constraints , 2011, 2011 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM).

[24] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[25] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .