Learning how to flock: deriving individual behaviour from collective behaviour with multi-agent reinforcement learning and natural evolution strategies

This work proposes a method for predicting the internal mechanisms of individual agents using observed collective behaviours by multi-agent reinforcement learning (MARL). Since the emergence of group behaviour among many agents can undergo phase transitions, and the action space will not in general be smooth, natural evolution strategies were adopted for updating a policy function. We tested the approach using a well-known flocking algorithm as a target model for our system to learn. With the data obtained from this rule-based model, the MARL model was trained, and its acquired behaviour was compared to the original. In the process, we discovered that agents trained by MARL can self-organize flow patterns using only local information. The expressed pattern is robust to changes in the initial positions of agents, whilst being sensitive to the training conditions used.