A new approach for structural credit assignment in distributed reinforcement learning systems

Most existing algorithm for structural credit assignment are developed for competitive reinforcement learning systems. In competitive reinforcement learning system, agents are activated one by one, so there is only one active agent at a time and structural credit assignment could be implemented by some temporal credit assignment algorithms. In collaborated reinforcement learning systems, agents are activated simultaneously, so how to transform the global reinforcement signal fed back from the environment to a reinforcement vector is a crucial difficulty that could not be slide over. In this article, the first really feasible and efficient structural credit assignment difficulty in collaborated reinforcement learning systems is primarily solved. The experiments show that the algorithm converges very rapidly and the assignment result is quite satisfying.

[1]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[2]  Eric B. Baum,et al.  Toward a Model of Intelligence as an Economy of Agents , 1999, Machine Learning.

[3]  John H. Holland,et al.  Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[4]  H. R. Berenji,et al.  Competition and collaboration among fuzzy reinforcement learning agents , 1998, 1998 IEEE International Conference on Fuzzy Systems Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36228).

[5]  T. Kovacs XCS Classifier System Reliably Evolves Accurate, Complete, and Minimal Representations for Boolean Functions , 1998 .

[6]  Osamu Ikeda,et al.  Acquisition of coordinated behavior by modular Q-learning agents , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[7]  Ron Sun,et al.  Autonomous learning of sequential tasks: experiments and analyses , 1998, IEEE Trans. Neural Networks.

[8]  John S. Bay,et al.  Task decomposition and dynamic policy merging in the distributed Q-learning classifier system , 1997, Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97. 'Towards New Computational Principles for Robotics and Automation'.

[9]  Gerhard Weiß,et al.  Distributed reinforcement learning , 1995, Robotics Auton. Syst..

[10]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[11]  Tuomas Sandholm,et al.  Approaches to winner determination in combinatorial auctions , 2000, Decis. Support Syst..