论文信息 - Solving the Rubik's Cube Without Human Knowledge

Solving the Rubik's Cube Without Human Knowledge

A generally intelligent agent must be able to teach itself how to solve problems in complex domains with minimal human supervision. Recently, deep reinforcement learning algorithms combined with self-play have achieved superhuman proficiency in Go, Chess, and Shogi without human data or domain knowledge. In these environments, a reward is always received at the end of the game, however, for many combinatorial optimization environments, rewards are sparse and episodes are not guaranteed to terminate. We introduce Autodidactic Iteration: a novel reinforcement learning algorithm that is able to teach itself how to solve the Rubik's Cube with no human assistance. Our algorithm is able to solve 100% of randomly scrambled cubes while achieving a median solve length of 30 moves -- less than or equal to solvers that employ human domain knowledge.

[1] Gene Cooperman,et al. Twenty-six moves suffice for Rubik's cube , 2007, ISSAC '07.

[2] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[3] Samy Bengio,et al. Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[4] Otakar Trunda,et al. Deep Heuristic-learning in the Rubik's Cube Domain: An Experimental Evaluation , 2017, ITAT.

[5] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[6] Tomas Rokicki,et al. Twenty-Two Moves Suffice for Rubik’s Cube® , 2010 .

[7] Malcolm I. Heywood,et al. The Rubik cube and GP Temporal Sequence learning: An initial study , 2011 .

[8] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[9] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[10] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[11] John J. Grefenstette,et al. Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..

[12] Tomas Rokicki,et al. The Diameter of the Rubik's Cube Group Is Twenty , 2013, SIAM J. Discret. Math..

[13] Malcolm I. Heywood,et al. Discovering Rubik's Cube Subgroups using Coevolutionary GP: A Five Twist Experiment , 2016, GECCO.

[14] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[15] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[16] Richard B. Segal,et al. On the Scalability of Parallel UCT , 2010, Computers and Games.

[17] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18] Léon Bottou,et al. From machine learning to machine reasoning , 2011, Machine Learning.

[19] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[20] Richard E. Korf,et al. Finding Optimal Solutions to Rubik's Cube Using Pattern Databases , 1997, AAAI/IAAI.

[21] Calvin Lee,et al. Rubik’s cube solver , 2018 .

[22] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[23] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[24] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[25] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[26] David Barber,et al. Thinking Fast and Slow with Deep Learning and Tree Search , 2017, NIPS.

[27] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.