Automatic scanpath generation with deep recurrent neural networks

Many computer vision algorithms are biologically inspired and designed based on the human visual system. Convolutional neural networks (CNNs) are similarly inspired by the primary visual cortex in the human brain. However, the key difference between current visual models and the human visual system is how the visual information is gathered and processed. We make eye movements to collect information from the environment for navigation and task performance. We also make specific eye movements to important regions in the stimulus to perform the task-at-hand quickly and efficiently. Researchers have used expert scanpaths to train novices for improving the accuracy of visual search tasks. One of the limitations of such a system is that we need an expert to examine each visual stimuli beforehand to generate the scanpaths. In order to extend the idea of gaze guidance to a new unseen stimulus, there is a need for a computational model that can automatically generate expert-like scanpaths. We propose a model for automatic scanpath generation using a convolutional neural network (CNN) and long short-term memory (LSTM) modules. Our model uses LSTMs due to the temporal nature of eye movement data (scanpaths) where the system makes fixation predictions based on previous locations examined.

[1]  Sheriff Jolaoso,et al.  Scanpath comparison revisited , 2010, ETRA.

[2]  Ann McNamara,et al.  Subtle gaze manipulation for improved mammography training , 2011, APGV '11.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.