论文信息 - Self-Supervised Learning for Multi-Goal Grid World: Comparing Leela and Deep Q Network

Self-Supervised Learning for Multi-Goal Grid World: Comparing Leela and Deep Q Network

Modern machine learning research has explored numerous approaches to solving reinforcement learning with multiple goals and sparse rewards as well as learning correct actions from a small number of exploratory samples. We explore the ability of a self-supervised system which automatically creates and tests symbolic hypotheses about the world to address these same issues. Leela is a system which builds an understanding of the world using constructivist artificial intelligence. For our study, we create an N ∗ N grid world with goals related to proprioceptive or visual positions for exploration. We compare Leela to a DQN which includes hindsight for improving multigoal learning with sparse rewards. Our results show that Leela is able to learn to solve multigoal problems in an N ∗N world with approximately 160N exploratory steps compared to 360N steps required by the DQN.

Steve Kommrusch | Steve Kommrusch

[1] Jean Piaget,et al. Part I: Cognitive development in children: Piaget development and learning , 1964 .

[2] Kai-Uwe Kühnberger,et al. Neural-Symbolic Learning and Reasoning: A Survey and Interpretation , 2017, Neuro-Symbolic Artificial Intelligence.

[3] Joel Lehman,et al. Learning to Continually Learn , 2020, ECAI.

[4] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[5] G. Drescher. Genetic AI: Translating piaget into lisp , 1986 .

[6] Matthew Crosby,et al. Association for the Advancement of Artificial Intelligence , 2014 .

[7] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[8] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[9] J. Kalaska,et al. Neural mechanisms for interacting with a world full of action choices. , 2010, Annual review of neuroscience.

[10] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[11] Igor Mordatch,et al. Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[12] Gary L. Drescher,et al. Made-up minds - a constructivist approach to artificial intelligence , 1991 .