ABSNet: Aesthetics-Based Saliency Network Using Multi-Task Convolutional Network

As a smart visual attention mechanism to analyze visual scenes, visual saliency has been shown to closely correlate with semantic information such as faces. Although many semantic-information-guided saliency models have been proposed, to the best of our knowledge, no semantic information in affective domain has been employed for saliency detection. Aesthetic, the affective perceptual quality that integrates factors like scene composition and contrast, can certainly benefit visual attention that highly depends on these visual factors. In this letter, we propose an end-to-end multi-task framework called aesthetics-based saliency network (ABSNet). We use three commonly-used shared backbones and design two distinct branches for each task. Mean square error (MSE) loss and Earth Mover's Distance (EMD) loss are jointly adopted to alternately train the shared network and individual branch for different tasks, facilitating the proposed model to extract more effective features for visual perception. Moreover, our model is resolution-friendly to predict saliency for images of arbitrary size. It has been shown that the proposed multi-task method is superior over single-task version and outperforms state-of-the-art saliency methods.

[1]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[2]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[3]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[4]  Kathryn Coe,et al.  Art: The replicable unit-An inquiry into the possible origin of art as a social behavior , 1992 .

[5]  King Ngi Ngan,et al.  Boundary-Guided Optimization Framework for Saliency Refinement , 2018, IEEE Signal Processing Letters.

[6]  Dimitris Samaras,et al.  Squared Earth Mover's Distance-based Loss for Training Deep Neural Networks , 2016, ArXiv.

[7]  Francesca Murabito,et al.  Top-Down Saliency Detection Driven by Visual Classification , 2017, Comput. Vis. Image Underst..

[8]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Rainer Goebel,et al.  Contextual Encoder-Decoder Network for Visual Saliency Prediction , 2019, Neural Networks.

[10]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Chengyao Shen Learning High-Level Concepts by Training A Deep Network on Eye Fixations , 2012 .

[12]  Ge Li,et al.  Saliency Detection by Adaptive Channel Fusion , 2018, IEEE Signal Processing Letters.

[13]  Srinivas S. Kruthiventi,et al.  Saliency Unified: A Deep Architecture for simultaneous Eye Fixation Prediction and Salient Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Hao Li,et al.  A multiscale dilated dense convolutional network for saliency prediction with instance-level attention competition , 2019, J. Vis. Commun. Image Represent..

[15]  Xiongkuo Min,et al.  GazeGAN: A Generative Adversarial Saliency Model based on Invariance Analysis of Human Gaze During Scene Free Viewing , 2019, ArXiv.

[16]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[17]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Noel E. O'Connor,et al.  SalGAN: Visual Saliency Prediction with Generative Adversarial Networks , 2017, ArXiv.

[19]  Noel E. O'Connor,et al.  Shallow and Deep Convolutional Networks for Saliency Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Michael Dorr,et al.  Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Richard Lind Attention and the Aesthetic Object , 1980 .

[22]  Tianming Liu,et al.  Learning to Predict Eye Fixations via Multiresolution Convolutional Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Nanning Zheng,et al.  Automatic salient object segmentation based on context and shape prior , 2011, BMVC.

[24]  Junchi Yan,et al.  Visual Saliency Detection via Sparsity Pursuit , 2010, IEEE Signal Processing Letters.

[25]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[26]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Naila Murray,et al.  A deep architecture for unified aesthetic prediction , 2017, ArXiv.

[28]  Zhao Zhang,et al.  CNN-based encoder-decoder networks for salient object detection: A comprehensive review and recent advances , 2021, Inf. Sci..

[29]  Wenguan Wang,et al.  Deep Visual Attention Prediction , 2017, IEEE Transactions on Image Processing.

[30]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Tommy W. S. Chow,et al.  Graph model-based salient object detection using objectness and multiple saliency cues , 2019, Neurocomputing.

[33]  Frédo Durand,et al.  A Benchmark of Computational Models of Saliency to Predict Human Fixations , 2012 .

[34]  Guangming Shi,et al.  A Convolutional Encoder-Decoder Network With Skip Connections for Saliency Prediction , 2019, IEEE Access.

[35]  Qi Zhao,et al.  Label Consistent Quadratic Surrogate model for visual saliency prediction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Shuang Ma,et al.  A-Lamp: Adaptive Layout-Aware Multi-patch Deep Convolutional Neural Network for Photo Aesthetic Assessment , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Thomas A. Funkhouser,et al.  Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Matthias Bethge,et al.  Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet , 2014, ICLR.

[40]  Naila Murray,et al.  End-to-End Saliency Mapping via Probability Distribution Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).