Transfer Learning and Curriculum Learning in Sokoban

Transfer learning can speed up training in machine learning, and is regularly used in classification tasks. It reuses prior knowledge from other tasks to pre-train networks for new tasks. In reinforcement learning, learning actions for a behavior policy that can be applied to new environments is still a challenge, especially for tasks that involve much planning. Sokoban is a challenging puzzle game. It has been used widely as a benchmark in planning-based reinforcement learning. In this paper, we show how prior knowledge improves learning in Sokoban tasks. We find that reusing feature representations learned previously can accelerate learning new, more complex, instances. In effect, we show how curriculum learning, from simple to complex tasks, works in Sokoban. Furthermore, feature representations learned in simpler instances are more general, and thus lead to positive transfers towards more complex tasks, but not vice versa. We have also studied which part of the knowledge is most important for transfer to succeed, and identify which layers should be used for pre-training.

[1]  Jessica B. Hamrick,et al.  On the role of planning in model-based deep reinforcement learning , 2020, ArXiv.

[2]  Carla P. Gomes,et al.  A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning Instances , 2021, NeurIPS.

[3]  Matthew E. Taylor,et al.  Initial Progress in Transfer for Deep Reinforcement Learning Algorithms , 2016 .

[4]  Aske Plaat,et al.  Learning to Play: Reinforcement Learning and Games , 2020 .

[5]  Matthew E. Taylor,et al.  Policy Transfer using Reward Shaping , 2015, AAMAS.

[6]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Minwoo Lee,et al.  Faster reinforcement learning after pretraining deep networks to predict state dynamics , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Serge J. Belongie,et al.  Sample-Efficient Reinforcement Learning through Transfer and Architectural Priors , 2018, ArXiv.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Carla P. Gomes,et al.  Solving Hard AI Planning Instances Using Curriculum-Driven Deep Reinforcement Learning , 2020, IJCAI.

[13]  Tom Eccles,et al.  An investigation of model-free planning , 2019, ICML.

[14]  Uri Zwick,et al.  SOKOBAN and other motion planning problems , 1999, Comput. Geom..

[15]  Yanfeng Shu,et al.  Transfer Learning and Deep Domain Adaptation , 2020, Advances and Applications in Deep Learning.

[16]  Joseph Culberson,et al.  Sokoban is PSPACE-complete , 1997 .

[17]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[18]  Javier García,et al.  Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..

[19]  Matthew E. Taylor,et al.  Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning , 2017, ArXiv.

[20]  Michael Cook,et al.  Hyperstate Space Graphs for Automated Game Analysis , 2019, 2019 IEEE Conference on Games (CoG).

[21]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[22]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[23]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[25]  Rémi Munos,et al.  Learning to Search with MCTSnets , 2018, ICML.

[26]  Santiago Ontañón,et al.  A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.

[27]  Razvan Pascanu,et al.  Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[28]  Daniel Guo,et al.  Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.

[29]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.