Deep Saliency Models : The Quest For The Loss Function

Recent advances in deep learning have pushed the performances of visual saliency models way further than it has ever been. Numerous models in the literature present new ways to design neural networks, to arrange gaze pattern data, or to extract as much high and low-level image features as possible in order to create the best saliency representation. However, one key part of a typical deep learning model is often neglected: the choice of the loss function. In this work, we explore some of the most popular loss functions that are used in deep saliency models. We demonstrate that on a fixed network architecture, modifying the loss function can significantly improve (or depreciate) the results, hence emphasizing the importance of the choice of the loss function when designing a model. We also introduce new loss functions that have never been used for saliency prediction to our knowledge. And finally, we show that a linear combination of several well-chosen loss functions leads to significant improvements in performances on different datasets as well as on a different network architecture, hence demonstrating the robustness of a combined metric.

[1]  O. Meur,et al.  Predicting visual fixations on video based on low-level visual features , 2007, Vision Research.

[2]  Matthias Bethge,et al.  Information-theoretic model comparison unifies saliency metrics , 2015, Proceedings of the National Academy of Sciences.

[3]  Ali Borji,et al.  CAT2000: A Large Scale Fixation Dataset for Boosting Saliency Research , 2015, ArXiv.

[4]  O. Meur,et al.  Introducing context-dependent and spatially-variant viewing biases in saccadic models , 2016, Vision Research.

[5]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[6]  Jorma Laaksonen,et al.  Saliency Revisited: Analysis of Mouse Movements Versus Fixations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Leon A. Gatys,et al.  Understanding Low- and High-Level Contributions to Fixation Prediction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Stan Sclaroff,et al.  Saliency Detection: A Boolean Map Approach , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[12]  Patrick Le Callet,et al.  A coherent computational approach to model bottom-up visual attention , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[14]  Frédo Durand,et al.  Where Should Saliency Models Look Next? , 2016, ECCV.

[15]  Naila Murray,et al.  End-to-End Saliency Mapping via Probability Distribution Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Rita Cucchiara,et al.  Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model , 2016, IEEE Transactions on Image Processing.

[17]  Aykut Erdem,et al.  Spatio-Temporal Saliency Networks for Dynamic Saliency Prediction , 2016, IEEE Transactions on Multimedia.

[18]  Thierry Baccino,et al.  Methods for comparing scanpaths and saliency maps: strengths and weaknesses , 2012, Behavior Research Methods.

[19]  Ya Zhang,et al.  An Element Sensitive Saliency Model with Position Prior Learning for Web Pages , 2018, ICIAI 2019.

[20]  Jorma Laaksonen,et al.  Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features , 2016, Neurocomputing.

[21]  Antón García-Díaz,et al.  Saliency from hierarchical adaptation through decorrelation and variance normalization , 2012, Image Vis. Comput..

[22]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[23]  Matthias Bethge,et al.  DeepGaze II: Reading fixations from deep features trained on object recognition , 2016, ArXiv.

[24]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Nicolas Riche,et al.  Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Rita Cucchiara,et al.  A deep multi-level network for saliency prediction , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[27]  Sen Jia,et al.  EML-NET: An Expandable Multi-Layer NETwork for Saliency Prediction , 2018, Image Vis. Comput..

[28]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Noel E. O'Connor,et al.  Shallow and Deep Convolutional Networks for Saliency Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[31]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Matthias Bethge,et al.  Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet , 2014, ICLR.

[33]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[34]  Zhi Liu,et al.  Saccadic model of eye movements for free-viewing condition , 2015, Vision Research.

[35]  Wenguan Wang,et al.  Deep Visual Attention Prediction , 2017, IEEE Transactions on Image Processing.

[36]  Matthias Bethge,et al.  Saliency Benchmarking Made Easy: Separating Models, Maps and Metrics , 2017, ECCV.

[37]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[39]  Ali Borji,et al.  Analysis of Scores, Datasets, and Models in Visual Saliency Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[40]  Neil D. B. Bruce,et al.  A Deeper Look at Saliency: Feature Contrast, Semantics, and Beyond , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[42]  Haibin Ling,et al.  Revisiting Video Saliency Prediction in the Deep Learning Era , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Qi Zhao,et al.  Webpage Saliency , 2014, ECCV.

[44]  Ruigang Yang,et al.  Inferring Salient Objects from Human Fixations , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Ali Borji,et al.  Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study , 2013, IEEE Transactions on Image Processing.

[47]  Noel E. O'Connor,et al.  SalGAN: Visual Saliency Prediction with Generative Adversarial Networks , 2017, ArXiv.

[48]  Ling Shao,et al.  Video Salient Object Detection via Fully Convolutional Networks , 2017, IEEE Transactions on Image Processing.

[49]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[50]  Ali Borji,et al.  Understanding and Visualizing Deep Visual Saliency Models , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Junwei Han,et al.  A Deep Spatial Contextual Long-Term Recurrent Convolutional Network for Saliency Detection , 2016, IEEE Transactions on Image Processing.

[52]  Qi Zhao,et al.  Learning saliency-based visual attention: A review , 2013, Signal Process..

[53]  Ali Borji,et al.  Saliency Prediction in the Deep Learning Era: An Empirical Investigation , 2018, ArXiv.

[54]  Michael Dorr,et al.  Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.