Scalable cooperative multiagent reinforcement learning in the context of an organization

Reinforcement learning techniques have been successfully used to solve single agent optimization problems but many of the real problems involve multiple agents, or multi-agent systems. This explains the growing interest in multi-agent reinforcement learning algorithms, or MARL. To be applicable in large real domains, MARL algorithms need to be both stable and scalable. A scalable MARL will be able to perform adequately as the number of agents increases. A MARL algorithm is stable if all agents (eventually) converge to a stable joint policy. Unfortunately, most of the previous approaches lack at least one of these two crucial properties. This dissertation proposes a scalable and stable MARL framework using a network of mediator agents. The network connections restrict the space of valid policies, which reduces the search time and achieves scalability. Optimizing performance in such a system consists of optimizing two subproblems: optimizing mediators' local policies and optimizing the structure of the network interconnecting mediators and servers. I present extensions to Markovian models that allow exponential savings in time and space. I also present the first integrated framework for MARL in a network, which includes both a MARL algorithm and a reorganization algorithm that work concurrently with one another. To evaluate performance, I use the distributed task allocation problem as a motivating domain.