The Impact of Determinism on Learning Atari 2600 Games

Pseudo-random number generation on the Atari 2600 was commonly accomplished using a Linear Feedback Shift Register (LFSR). One drawback was that the initial seed for the LFSR had to be hard-coded into the ROM. To overcome this constraint, programmers sampled from the LFSR once per frame, including title and end screens. Since a human player will have some random amount of delay between seeing the title screen and starting to play, the LFSR state was effectively randomized at the beginning of the game despite the hard-coded seed. Other games used the player’s actions as a source of randomness. Notable pseudo-random games include Adventure in which a bat randomly steals and hides items around the game world and River Raid which used randomness to make enemy movements less predictable. Relying on the player to provide a source of randomness is not sufficient for computer controlled agents which are capable of memorizing and repeating pre-determined sequences of actions. Ideally, the games themselves would provide stochasticity generated from an external source such as the CPU clock. In practice, this was not an option presented by the hardware. Atari games are deterministic given a fixed policy leading to a set sequence of actions. This article discusses different approaches for adding stochasticity to Atari games and examines how effective each approach is at derailing an agent known to memorize action sequences. Additionally it is the authors’ hope that this article will spark discussion in the community over the following questions:

[1]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[2]  Risto Miikkulainen,et al.  A Neuroevolution Approach to General Atari Game Playing , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[3]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..