Previous approaches to multi-agent reinforcement learning are either very limited or heuristic by nature. The main reason is: each agent’s environment continually changes because the other agents keep changing. Traditional reinforcement learning algorithms cannot properly deal with this. This paper, however, introduces a novel, general, sound method for multiple, reinforcement learning agents living a single life with limited computational resources in an unrestricted environment. The method properly takes into account that whatever some agent learns at some point may affect learning conditions for other agents or for itself at any later point. It is based on an efficient, stack-based backtracking procedure called "environment-independent reinforcement acceleration" (EIRA), which is guaranteed to make each agents learning history a history of performance improvements (long term reinforcement accelerations). The principles have been implemented in an illustrative multi-agent system, where each agent is in fact just a connection in a fully recurrent reinforcement learning neural net.
[1]
Pravin Varaiya,et al.
Stochastic Systems: Estimation, Identification, and Adaptive Control
,
1986
.
[2]
Richard S. Sutton,et al.
Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming
,
1990,
NIPS 1990.
[3]
Stuart J. Russell,et al.
Principles of Metareasoning
,
1989,
Artif. Intell..
[4]
Mark S. Boddy,et al.
Deliberation Scheduling for Problem Solving in Time-Constrained Environments
,
1994,
Artif. Intell..
[5]
Corso Elvezia,et al.
Environment-independent Reinforcement Acceleration
,
1995
.
[6]
Jürgen Schmidhuber,et al.
Solving POMDPs with Levin Search and EIRA
,
1996,
ICML.
[7]
Juergen Schmidhuber,et al.
Incremental self-improvement for life-time multi-agent reinforcement learning
,
1996
.
[8]
Juergen Schmidhuber,et al.
A General Method For Incremental Self-Improvement And Multi-Agent Learning In Unrestricted Environme
,
1999
.