SalGAN: Visual Saliency Prediction with Generative Adversarial Networks

We introduce SalGAN, a deep convolutional neural network for visual saliency prediction trained with adversarial examples. The first stage of the network consists of a generator model whose weights are learned by back-propagation computed from a binary cross entropy (BCE) loss over downsampled versions of the saliency maps. The resulting prediction is processed by a discriminator network trained to solve a binary classification task between the saliency maps generated by the generative stage and the ground truth ones. Our experiments show how adversarial training allows reaching state-of-the-art performance across different metrics when combined with a widely-used loss function like BCE. Our results can be reproduced with the source code and trained models available at https://imatge-upc.github. io/saliency-salgan-2017/.

[1]  Yizhou Yu,et al.  Visual saliency based on multiscale deep features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[4]  Wojciech Matusik,et al.  Eye Tracking for Everyone , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[6]  Ali Borji,et al.  Boosting bottom-up and top-down visual features for saliency estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Christof Koch,et al.  Attentional Selection for Object Recognition - A Gentle Way , 2002, Biologically Motivated Computer Vision.

[8]  Srinivas S. Kruthiventi,et al.  Saliency Unified: A Deep Architecture for simultaneous Eye Fixation Prediction and Salient Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[11]  Jorma Laaksonen,et al.  Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features , 2016, Neurocomputing.

[12]  Junwei Han,et al.  A Deep Spatial Contextual Long-Term Recurrent Convolutional Network for Saliency Detection , 2016, IEEE Transactions on Image Processing.

[13]  Matthias Bethge,et al.  Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet , 2014, ICLR.

[14]  R. Venkatesh Babu,et al.  DeepFix: A Fully Convolutional Neural Network for Predicting Human Eye Fixations , 2015, IEEE Transactions on Image Processing.

[15]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Matthias Bethge,et al.  Information-theoretic model comparison unifies saliency metrics , 2015, Proceedings of the National Academy of Sciences.

[17]  Stan Sclaroff,et al.  Saliency Detection: A Boolean Map Approach , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[19]  Matthias Bethge,et al.  DeepGaze II: Reading fixations from deep features trained on object recognition , 2016, ArXiv.

[20]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Frédo Durand,et al.  Where Should Saliency Models Look Next? , 2016, ECCV.

[22]  Naila Murray,et al.  End-to-End Saliency Mapping via Probability Distribution Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Rita Cucchiara,et al.  A deep multi-level network for saliency prediction , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[25]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  Nicolas Riche,et al.  Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics , 2013, 2013 IEEE International Conference on Computer Vision.

[28]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Frédo Durand,et al.  What Do Different Evaluation Metrics Tell Us About Saliency Models? , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Noel E. O'Connor,et al.  Shallow and Deep Convolutional Networks for Saliency Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Michael Dorr,et al.  Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.