A GENERAL METHOD FOR MULTI-AGENT REINFORCEMENT LEARNING IN UNRESTRICTED ENVIRONMENTS

Previous approaches to multi-agent reinforcement learning are either very limited or heuristic by nature. The main reason is: each agent’s environment continually changes because the other agents keep changing. Traditional reinforcement learning algorithms cannot properly deal with this. This paper, however, introduces a novel, general, sound method for multiple, reinforcement learning agents living a single life with limited computational resources in an unrestricted environment. The method properly takes into account that whatever some agent learns at some point may affect learning conditions for other agents or for itself at any later point. It is based on an efficient, stack-based backtracking procedure called "environment-independent reinforcement acceleration" (EIRA), which is guaranteed to make each agents learning history a history of performance improvements (long term reinforcement accelerations). The principles have been implemented in an illustrative multi-agent system, where each agent is in fact just a connection in a fully recurrent reinforcement learning neural net.