DeepFeat: A Bottom-Up and Top-Down Saliency Model Based on Deep Features of Convolutional Neural Networks

A deep feature-based saliency model (DeepFeat) is developed to leverage understanding of the prediction of human fixations. Conventional saliency models often predict the human visual attention relying on few image cues. Although such models predict fixations on a variety of image complexities, their approaches are limited to the incorporated features. In this paper, we aim to utilize the deep features of convolutional neural networks by combining bottom-up (BU) and top-down (TD) saliency maps. The proposed framework is applied on deep features of three popular deep convolutional neural networks (DCNNs). We exploit four evaluation metrics to evaluate the correspondence between the proposed saliency model and the ground-truth fixations over two datasets. The results demonstrate that the deep features of pretrained DCNNs over the ImageNet dataset are strong predictors of the human fixations. The incorporation of BU and TD saliency maps outperforms the individual BU or TD implementations. Moreover, in comparison to nine saliency models, including four state-of-the-art and five conventional saliency models, our proposed DeepFeat model outperforms the conventional saliency models over all four evaluation metrics.

[1]  Víctor Leborán,et al.  On the relationship between optical variability, visual saliency, and eye fixations: a computational approach. , 2012, Journal of vision.

[2]  Iain D. Gilchrist,et al.  Visual correlates of fixation selection: effects of scale and time , 2005, Vision Research.

[3]  Derrick J. Parkhurst,et al.  Scene content selected by active vision. , 2003, Spatial vision.

[4]  Laurent Itti,et al.  Biologically Inspired Mobile Robot Vision Localization , 2009, IEEE Transactions on Robotics.

[5]  Naila Murray,et al.  End-to-End Saliency Mapping via Probability Distribution Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[7]  Rainer Stiefelhagen,et al.  “Look at this!” learning to guide visual saliency in human-robot interaction , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Kimitoshi Yamazaki,et al.  Home-Assistant Robot for an Aging Society , 2012, Proceedings of the IEEE.

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Jun Qin,et al.  Neural networks based EEG-Speech Models , 2016, ArXiv.

[11]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[12]  C.-C. Jay Kuo,et al.  Learning a Combined Model of Visual Saliency for Fixation Prediction , 2016, IEEE Transactions on Image Processing.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[15]  Benjamin B. Bederson,et al.  Automatic thumbnail cropping and its effectiveness , 2003, UIST '03.

[16]  Jun Qin,et al.  A Comparison Study of Saliency Models for Fixation Prediction on Infants and Adults , 2018, IEEE Transactions on Cognitive and Developmental Systems.

[17]  Thierry Baccino,et al.  Methods for comparing scanpaths and saliency maps: strengths and weaknesses , 2012, Behavior Research Methods.

[18]  Antoine Coutrot,et al.  Visual Attention Saccadic Models Learn to Emulate Gaze Patterns From Childhood to Adulthood , 2017, IEEE Transactions on Image Processing.

[19]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[20]  Jun Qin,et al.  Infants gaze pattern analyzing using contrast entropy minimization , 2015, 2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[21]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[22]  Dattaguru V Kamat A framework for visual saliency detection with applications to image thumbnailing , 2009 .

[23]  Shenmin Zhang,et al.  What do saliency models predict? , 2014, Journal of vision.

[24]  Rita Cucchiara,et al.  Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model , 2016, IEEE Transactions on Image Processing.

[25]  Javier R. Movellan,et al.  Optimal scanning for faster object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Nicolas Riche,et al.  RARE2012: A multi-scale rarity-based saliency detection with its comparative statistical analysis , 2013, Signal Process. Image Commun..

[27]  Nicholas R. Gans,et al.  Robot-to-human feedback and automatic object grasping using an RGB-D camera–projector system , 2017, Robotica.

[28]  Majid Nili Ahmadabadi,et al.  Online learning of task-driven object-based visual attention control , 2010, Image Vis. Comput..

[29]  Jun Qin,et al.  Enhanced Factored Three-Way Restricted Boltzmann Machines for Speech Detection , 2016, ArXiv.

[30]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Jorma Laaksonen,et al.  Bottom-Up Fixation Prediction Using Unsupervised Hierarchical Models , 2016, ACCV Workshops.

[32]  L. Itti,et al.  Quantifying center bias of observers in free viewing of dynamic natural scenes. , 2009, Journal of vision.

[33]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[34]  Christof Koch,et al.  Predicting human gaze using low-level saliency combined with face detection , 2007, NIPS.

[35]  Xavier Giró-i-Nieto,et al.  End-to-end Convolutional Network for Saliency Prediction , 2015, ArXiv.

[36]  Atsuto Maki,et al.  Attentional Scene Segmentation: Integrating Depth and Motion , 2000, Comput. Vis. Image Underst..

[37]  Leon A. Gatys,et al.  Understanding Low- and High-Level Contributions to Fixation Prediction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  R. Venkatesh Babu,et al.  DeepFix: A Fully Convolutional Neural Network for Predicting Human Eye Fixations , 2015, IEEE Transactions on Image Processing.

[39]  Aykut Erdem,et al.  Visual saliency estimation by nonlinearly integrating features using region covariances. , 2013, Journal of vision.

[40]  Antonio Torralba,et al.  Top-down control of visual attention in object detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[41]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[42]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Stan Sclaroff,et al.  Exploiting Surroundedness for Saliency Detection: A Boolean Map Approach , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Ariel Shamir,et al.  Improved seam carving for video retargeting , 2008, ACM Trans. Graph..

[45]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[46]  Jorma Laaksonen,et al.  Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features , 2016, Neurocomputing.

[47]  Junwei Han,et al.  A Deep Spatial Contextual Long-Term Recurrent Convolutional Network for Saliency Detection , 2016, IEEE Transactions on Image Processing.

[48]  Jun Qin,et al.  Engineering Modelling of Data Acquisition and Digital Instrumentation forIntelligent Learning and Recognition , 2015 .

[49]  Matthias Bethge,et al.  Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet , 2014, ICLR.

[50]  Wenbin Zou,et al.  Saliency Tree: A Novel Saliency Detection Framework , 2014, IEEE Transactions on Image Processing.

[51]  Frédo Durand,et al.  What Do Different Evaluation Metrics Tell Us About Saliency Models? , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[53]  Michael Dorr,et al.  Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Xiaogang Wang,et al.  Saliency detection by multi-context deep learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Rajesh P. N. Rao,et al.  Bayesian inference and attentional modulation in the visual cortex , 2005, Neuroreport.

[57]  Krista A. Ehinger,et al.  Modelling search for people in 900 scenes: A combined source model of eye guidance , 2009 .

[58]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Liming Zhang,et al.  A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[60]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  J. Qin,et al.  Engineering Modelling of Data Acquisition and Digital Instrumentation for Intelligent Learning and Recognition , 2016 .

[62]  Laurent Itti,et al.  Automatic foveation for video compression using a neurobiological model of visual attention , 2004, IEEE Transactions on Image Processing.

[63]  Victor Leboran,et al.  Dynamic Whitening Saliency , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[65]  Yiannis Aloimonos,et al.  Active Segmentation , 2009, Int. J. Humanoid Robotics.

[66]  Rita Cucchiara,et al.  A deep multi-level network for saliency prediction , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[67]  Guohui Tian,et al.  A Computing Model of Selective Attention for Service Robot Based on Spatial Data Fusion , 2018, J. Robotics.

[68]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[69]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[70]  Wenguan Wang,et al.  Deep Visual Attention Prediction , 2017, IEEE Transactions on Image Processing.

[71]  H. Nothdurft Salience of Feature Contrast , 2005 .

[72]  Jun Qin,et al.  Bottom up saliency evaluation via deep features of state-of-the-art convolutional neural networks , 2018, 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI).

[73]  Lina J. Karam,et al.  A Locally Weighted Fixation Density-Based Metric for Assessing the Quality of Visual Saliency Predictions , 2016, IEEE Transactions on Image Processing.

[74]  Feng Wu,et al.  Background Prior-Based Salient Object Detection via Deep Reconstruction Residual , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[75]  Stan Sclaroff,et al.  Saliency Detection: A Boolean Map Approach , 2013, 2013 IEEE International Conference on Computer Vision.

[76]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[77]  Matthias Bethge,et al.  DeepGaze II: Reading fixations from deep features trained on object recognition , 2016, ArXiv.

[78]  David Filliat,et al.  Exploring to learn visual saliency: The RL-IAC approach , 2018, Robotics Auton. Syst..

[79]  Brian Chu,et al.  Visualizing Residual Networks , 2017, ArXiv.