Visual salience and priority estimation for locomotion using a deep convolutional neural network

This paper presents a novel method of salience and priority estimation for the human visual system during locomotion. This visual information contains dynamic content derived from a moving viewpoint. The priority map, ranking key areas on the image, is created from probabilities of gaze fixations, merged from bottom-up features and top-down control on the locomotion. Two deep convolutional neural networks (CNNs), inspired by models of the primate visual system, are employed to capture local salience features and compute probabilities. The first network operates through the foveal and peripheral areas around the eye positions. The second network obtains the importance of fixated points that have long durations or multiple visits, of which such areas need more times to process or to recheck to ensure smooth locomotion. The results show that our proposed method outperforms the state-of-the-art by up to 30 %, computed from average of four well known metrics for saliency estimation.

[1]  Jillian H. Fecteau,et al.  Salience, relevance, and firing: a priority map for target selection , 2006, Trends in Cognitive Sciences.

[2]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[3]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[4]  David R. Bull,et al.  Fixation identification for low-sample-rate mobile eye trackers , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[5]  Iain D. Gilchrist,et al.  Visual correlates of fixation selection: effects of scale and time , 2005, Vision Research.

[6]  Henrik I. Christensen,et al.  Computational visual attention systems and their cognitive foundations: A survey , 2010, TAP.

[7]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[9]  Ali Borji,et al.  What/Where to Look Next? Modeling Top-Down Visual Attention in Complex Interactive Environments , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[10]  David R. Bull,et al.  Fixation Prediction and Visual Priority Maps for Biped Locomotion , 2018, IEEE Transactions on Cybernetics.

[11]  Naila Murray,et al.  Saliency estimation using a non-parametric low-level vision model , 2011, CVPR 2011.

[12]  David R. Bull,et al.  Combined morphological-spectral unsupervised image segmentation , 2005, IEEE Transactions on Image Processing.

[13]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[14]  John K. Tsotsos,et al.  Saliency, attention, and visual search: an information theoretic approach. , 2009, Journal of vision.

[15]  Patrick Le Callet,et al.  A spatio-temporal model of the selective human visual attention , 2005, IEEE International Conference on Image Processing 2005.

[16]  Ali Borji,et al.  Boosting bottom-up and top-down visual features for saliency estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Ali Borji,et al.  Analysis of Scores, Datasets, and Models in Visual Saliency Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  S. Yantis,et al.  Visual Attention: Bottom-Up Versus Top-Down , 2004, Current Biology.

[19]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Nicolas Riche,et al.  Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Michal Irani,et al.  Detecting Irregularities in Images and in Video , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[22]  Aykut Erdem,et al.  Visual saliency estimation by nonlinearly integrating features using region covariances. , 2013, Journal of vision.

[23]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.