Contrast-Oriented Deep Neural Networks for Salient Object Detection

Deep convolutional neural networks (CNNs) have become a key element in the recent breakthrough of salient object detection. However, existing CNN-based methods are based on either patchwise (regionwise) training and inference or fully convolutional networks. Methods in the former category are generally time-consuming due to severe storage and computational redundancies among overlapping patches. To overcome this deficiency, methods in the second category attempt to directly map a raw input image to a predicted dense saliency map in a single network forward pass. Though being very efficient, it is arduous for these methods to detect salient objects of different scales or salient regions with weak semantic information. In this paper, we develop hybrid contrast-oriented deep neural networks to overcome the aforementioned limitations. Each of our deep networks is composed of two complementary components, including a fully convolutional stream for dense prediction and a segment-level spatial pooling stream for sparse saliency inference. We further propose an attentional module that learns weight maps for fusing the two saliency predictions from these two streams. A tailored alternate scheme is designed to train these deep networks by fine-tuning pretrained baseline models. Finally, a customized fully connected conditional random field model incorporating a salient contour feature embedding can be optionally applied as a postprocessing step to improve spatial coherence and contour positioning in the fused result from these two streams. Extensive experiments on six benchmark data sets demonstrate that our proposed model can significantly outperform the state of the art in terms of all popular evaluation metrics.

[1]  Pingkun Yan,et al.  Visual Saliency by Selective Contrast , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Yizhou Yu,et al.  Deep Contrast Learning for Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Patrick Pérez,et al.  Geodesic image and video editing , 2010, TOGS.

[5]  Xiaogang Wang,et al.  Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Classification , 2014, ArXiv.

[6]  Shiguang Shan,et al.  Adaptive Partial Differential Equation Learning for Visual Saliency Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[8]  Liang Lin,et al.  Multi-label Image Recognition by Recurrently Discovering Attentional Regions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[10]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Yael Pritch,et al.  Saliency filters: Contrast based filtering for salient region detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Junwei Han,et al.  DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Huchuan Lu,et al.  Saliency Detection via Graph-Based Manifold Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Yunchao Wei,et al.  STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Lihi Zelnik-Manor,et al.  Context-aware saliency detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[18]  Yuan Xie,et al.  Weakly Supervised Salient Object Detection Using Image Labels , 2018, AAAI.

[19]  Yueting Zhuang,et al.  DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection , 2015, IEEE Transactions on Image Processing.

[20]  Derrick J. Parkhurst,et al.  Modeling the role of salience in the allocation of overt visual attention , 2002, Vision Research.

[21]  Yuan Xie,et al.  Instance-Level Salient Object Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[23]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Huchuan Lu,et al.  Deep networks for saliency detection via local estimation and global search , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Peng Jiang,et al.  Salient Region Detection by UFO: Uniqueness, Focusness and Objectness , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yizhou Yu,et al.  Visual Saliency Detection Based on Multiscale Deep CNN Features , 2016, IEEE Transactions on Image Processing.

[28]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Hefeng Wu,et al.  Weighted attentional blocks for probabilistic object tracking , 2013, The Visual Computer.

[30]  S. Avidan,et al.  Seam carving for content-aware image resizing , 2007, SIGGRAPH 2007.

[31]  Shang-Hong Lai,et al.  Fusing generic objectness and visual saliency for salient object detection , 2011, 2011 International Conference on Computer Vision.

[32]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[33]  Huchuan Lu,et al.  Saliency Detection with Recurrent Fully Convolutional Networks , 2016, ECCV.

[34]  Liang Lin,et al.  PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures With Edge-Preserving Coherence , 2015, IEEE Transactions on Image Processing.

[35]  Ying Wu,et al.  A unified approach to salient object detection via low rank matrix recovery , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  S. Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, CVPR 2009.

[37]  S. Mallat A wavelet tour of signal processing , 1998 .

[38]  Li Xu,et al.  Hierarchical Saliency Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[40]  Yizhou Yu,et al.  Visual saliency based on multiscale deep features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[42]  Huchuan Lu,et al.  Saliency detection via Cellular Automata , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[44]  Nianyi Li,et al.  A weighted sparse coding framework for saliency detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Gang Wang,et al.  Recurrent Attentional Networks for Saliency Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Yizhou Yu,et al.  Person Re-Identification Using Multiple Experts with Random Subspaces , 2014 .

[48]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[49]  James M. Rehg,et al.  The Secrets of Salient Object Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[51]  Nuno Vasconcelos,et al.  Learning Optimal Seeds for Diffusion-Based Salient Object Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Mei Han,et al.  Category-Independent Object-Level Saliency Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[53]  Laurent Itti,et al.  An Integrated Model of Top-Down and Bottom-Up Attention for Optimizing Detection Speed , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[54]  Simone Frintrop,et al.  Center-surround divergence of feature statistics for salient object detection , 2011, 2011 International Conference on Computer Vision.

[55]  Jian Sun,et al.  Saliency Optimization from Robust Background Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Nuno Vasconcelos,et al.  Bottom-up saliency is a discriminant process , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[57]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[58]  Yuzhen Niu,et al.  Saliency Aggregation: A Data-Driven Approach , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[60]  Xuelong Li,et al.  Two-Stage Learning to Predict Human Eye Fixations via SDAEs , 2016, IEEE Transactions on Cybernetics.

[61]  Weisi Lin,et al.  A Universal Framework for Salient Object Detection , 2016, IEEE Transactions on Multimedia.

[62]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[63]  Xiaogang Wang,et al.  Saliency detection by multi-context deep learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[66]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[67]  P. König,et al.  Does luminance‐contrast contribute to a saliency map for overt visual attention? , 2003, The European journal of neuroscience.