Behavioral Cloning in Atari Games Using a Combined Variational Autoencoder and Predictor Model

We explore an approach to behavioral cloning in video games. We are motivated to pursue a learning architecture that is data efficient and provides opportunity for interpreting player strategies and replicating player actions in unseen situations. To this end, we have developed a generative model that learns latent features of a game that can be used for training an action predictor. Specifically, our architecture combines a Variational Autoencoder with a discriminator mapping the latent space to action predictions (predictor). We compare our model performance to two different behavior cloning architectures: a discriminative model (a Convolutional Neural Network) mapping game states directly to actions, and a Variational Autoencoder with a predictor trained separately. Finally, we demonstrate how we can use the advantage of generative modeling to sample new states from the latent space of the Variational Autoencoder to analyze player actions and provide meaning to certain latent features.

[1]  Katja Hofmann,et al.  The Atari Grand Challenge Dataset , 2017, ArXiv.

[2]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[3]  Ivan Bratko,et al.  Behavioural Cloning: Phenomena, Results and Problems , 1995 .

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[6]  Luxin Zhang,et al.  Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset , 2019, ArXiv.

[7]  Ruslan Salakhutdinov,et al.  MineRL: A Large-Scale Dataset of Minecraft Demonstrations , 2019, IJCAI.

[8]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[9]  Nando de Freitas,et al.  Robust Imitation of Diverse Behaviors , 2017, NIPS.

[10]  Ville Hautamäki,et al.  Benchmarking End-to-End Behavioural Cloning on Video Games , 2020, 2020 IEEE Conference on Games (CoG).

[11]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[12]  Narinder Singh Punn,et al.  Enhanced Behavioral Cloning Based self-driving Car Using Transfer Learning , 2020, ArXiv.

[13]  D. J. Gorsich THE USE OF GAMING ENGINES FOR DESIGN REQUIREMENTS , 2020 .

[14]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[15]  Scott Niekum,et al.  Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences , 2020, ICML.