Self-Supervised Learning for Multi-Goal Grid World: Comparing Leela and Deep Q Network

Modern machine learning research has explored numerous approaches to solving reinforcement learning with multiple goals and sparse rewards as well as learning correct actions from a small number of exploratory samples. We explore the ability of a self-supervised system which automatically creates and tests symbolic hypotheses about the world to address these same issues. Leela is a system which builds an understanding of the world using constructivist artificial intelligence. For our study, we create an N ∗ N grid world with goals related to proprioceptive or visual positions for exploration. We compare Leela to a DQN which includes hindsight for improving multigoal learning with sparse rewards. Our results show that Leela is able to learn to solve multigoal problems in an N ∗N world with approximately 160N exploratory steps compared to 360N steps required by the DQN.