Activity guided multi-scales collaboration based on scaled-CNN for saliency prediction

Abstract Visual saliency prediction has achieved significant improvements with the advent of convolutional neural networks, but the breakthrough in saliency prediction accuracy comes at the high computational cost. In this paper, we present a lightweight saliency prediction model based on scaled up convolutional neural networks (CNN), utilizing image activity guided collaboration learning of global and local information at multiple scales. we use a pseudo-siamese network with a scaled up network (EfficientNet) as the backbone, and the two branches of the network respectively capture the global saliency feature and high-level local feature. Concretely, we first utilize the image complexity-related activity features (Image Activity Measure) as our low-level local salience prior, and then feed the input images and the activity maps to scaled up CNN modules to further learn high-level features in a multi-scale collaboration manner. Through extensive evaluation, we show that the proposed method exhibits competitive and consistent results on the challenging benchmark datasets, and our method has better prediction performance, fewer trainable parameters and faster inference speed. Moreover, the proposed model has low requirements for platform computing capabilities, which improves the universality of saliency application scenarios.

[1]  Nicolas Riche,et al.  Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics , 2013, 2013 IEEE International Conference on Computer Vision.

[2]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Nuno Vasconcelos,et al.  The discriminant center-surround hypothesis for bottom-up saliency , 2007, NIPS.

[5]  Leon A. Gatys,et al.  Understanding Low- and High-Level Contributions to Fixation Prediction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[8]  Gangyi Jiang,et al.  Optimizing Multistage Discriminative Dictionaries for Blind Image Quality Assessment , 2018, IEEE Transactions on Multimedia.

[9]  Sanghoon Lee,et al.  Transition of Visual Attention Assessment in Stereoscopic Images With Evaluation of Subjective Visual Quality and Discomfort , 2015, IEEE Transactions on Multimedia.

[10]  Aykut Erdem,et al.  Visual saliency estimation by nonlinearly integrating features using region covariances. , 2013, Journal of vision.

[11]  Frédo Durand,et al.  What Do Different Evaluation Metrics Tell Us About Saliency Models? , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Naila Murray,et al.  End-to-End Saliency Mapping via Probability Distribution Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Xiongkuo Min,et al.  How is Gaze Influenced by Image Transformations? Dataset and Model , 2019, IEEE Transactions on Image Processing.

[15]  Matthias Bethge,et al.  Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet , 2014, ICLR.

[16]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[17]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Patrick Le Callet,et al.  A coherent computational approach to model bottom-up visual attention , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Rita Cucchiara,et al.  A deep multi-level network for saliency prediction , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[20]  Nuno Vasconcelos,et al.  Biologically Inspired Object Tracking Using Center-Surround Saliency Mechanisms , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Lihi Zelnik-Manor,et al.  Context-Aware Saliency Detection , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Yang Wang,et al.  Salient Object Segmentation via Effective Integration of Saliency and Objectness , 2017, IEEE Transactions on Multimedia.

[23]  Ivan V. Bajic,et al.  Saliency-Aware Video Compression , 2014, IEEE Transactions on Image Processing.

[24]  Ronald A. Rensink The Dynamic Representation of Scenes , 2000 .

[25]  Quoc V. Le,et al.  GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.

[26]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[27]  Petros Maragos,et al.  Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention , 2013, IEEE Transactions on Multimedia.

[28]  Matthias Bethge,et al.  Information-theoretic model comparison unifies saliency metrics , 2015, Proceedings of the National Academy of Sciences.

[29]  Y. Nakayama,et al.  The history of JOV of 10 years , 2008, J. Vis..

[30]  Sen Jia,et al.  EML-NET: An Expandable Multi-Layer NETwork for Saliency Prediction , 2018, Image Vis. Comput..

[31]  Shu Fang,et al.  Learning Discriminative Subspaces on Random Contrasts for Image Saliency Analysis , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[32]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[33]  Xiaojun Chang,et al.  Revealing Event Saliency in Unconstrained Video Collection , 2017, IEEE Transactions on Image Processing.

[34]  Fatih Murat Porikli,et al.  Saliency-aware geodesic video object segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Weisi Lin,et al.  Saliency-Guided Quality Assessment of Screen Content Images , 2016, IEEE Transactions on Multimedia.

[36]  Christof Koch,et al.  Predicting human gaze using low-level saliency combined with face detection , 2007, NIPS.

[37]  Wei Liu,et al.  Saliency propagation from simple to difficult , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Shengxi Li,et al.  Closed-Form Optimization on Saliency-Guided Image Compression for HEVC-MSP , 2018, IEEE Transactions on Multimedia.

[39]  Michael Dorr,et al.  Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Christof Koch,et al.  Learning a saliency map using fixated locations in natural scenes. , 2011, Journal of vision.

[41]  R. Venkatesh Babu,et al.  DeepFix: A Fully Convolutional Neural Network for Predicting Human Eye Fixations , 2015, IEEE Transactions on Image Processing.

[42]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[43]  Noel E. O'Connor,et al.  SalGAN: Visual Saliency Prediction with Generative Adversarial Networks , 2017, ArXiv.

[44]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[45]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Ali Borji,et al.  Boosting bottom-up and top-down visual features for saliency estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Wenguan Wang,et al.  Deep Visual Attention Prediction , 2017, IEEE Transactions on Image Processing.

[48]  Matthias Bethge,et al.  Saliency Benchmarking Made Easy: Separating Models, Maps and Metrics , 2017, ECCV.

[49]  Ali Borji,et al.  CAT2000: A Large Scale Fixation Dataset for Boosting Saliency Research , 2015, ArXiv.

[50]  Dacheng Tao,et al.  Database Saliency for Fast Image Retrieval , 2015, IEEE Transactions on Multimedia.

[51]  Chao-Hung Lin,et al.  Patch-Based Image Warping for Content-Aware Retargeting , 2013, IEEE Transactions on Multimedia.

[52]  Rita Cucchiara,et al.  Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model , 2016, IEEE Transactions on Image Processing.

[53]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[54]  R. Vemuri,et al.  An analysis on the effect of image features on lossy coding performance , 2000, IEEE Signal Processing Letters.