Attention Guided Deep Imitation Learning

When a learning agent attempts to imitate human visuomotor behaviors, it may benefit from knowing the human demonstrator’s visual attention. Such information could clarify the goal of the demonstrator, i.e., the object being attended is the most likely target of the current action. Hence it could help the agent better infer and learn the demonstrator’s underlying state representation for decision making. We collect human control actions and eyetracking data for playing Atari games. We train a deep neural network to predict human actions, and show that including gaze information significantly improves the prediction accuracy. In addition, more biologically correct representation enhances prediction accuracy.