Beating the World's Best at Super Smash Bros. with Deep Reinforcement Learning

There has been a recent explosion in the capabilities of game-playing artificial intelligence. Many classes of RL tasks, from Atari games to motor control to board games, are now solvable by fairly generic algorithms, based on deep learning, that learn to play from experience with minimal knowledge of the specific domain of interest. In this work, we will investigate the performance of these methods on Super Smash Bros. Melee (SSBM), a popular console fighting game. The SSBM environment has complex dynamics and partial observability, making it challenging for human and machine alike. The multi-player aspect poses an additional challenge, as the vast majority of recent advances in RL have focused on single-agent environments. Nonetheless, we will show that it is possible to train agents that are competitive against and even surpass human professionals, a new result for the multi-player video game setting.

[1]  Thomas G. Dietterich What is machine learning? , 2015, Archives of Disease in Childhood.

[2]  Lars Chittka,et al.  Faculty Opinions recommendation of Mastering the game of Go with deep neural networks and tree search. , 2016 .

[3]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[4]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[5]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[6]  Michael I. Jordan,et al.  Trust Region Policy Optimization , 2015, ICML.

[7]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[10]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[12]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[13]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[14]  Volker Tresp,et al.  Nonlinear Markov Networks for Continuous Variables , 1997, NIPS.

[15]  A. Krizhevsky ImageNet Classification with Deep Convolutional Neural Networks , 2022 .