SuperCNN: A Superpixelwise Convolutional Neural Network for Salient Object Detection

Existing computational models for salient object detection primarily rely on hand-crafted features, which are only able to capture low-level contrast information. In this paper, we learn the hierarchical contrast features by formulating salient object detection as a binary labeling problem using deep learning techniques. A novel superpixelwise convolutional neural network approach, called SuperCNN, is proposed to learn the internal representations of saliency in an efficient manner. In contrast to the classical convolutional networks, SuperCNN has four main properties. First, the proposed method is able to learn the hierarchical contrast features, as it is fed by two meaningful superpixel sequences, which is much more effective for detecting salient regions than feeding raw image pixels. Second, as SuperCNN recovers the contextual information among superpixels, it enables large context to be involved in the analysis efficiently. Third, benefiting from the superpixelwise mechanism, the required number of predictions for a densely labeled map is hugely reduced. Fourth, saliency can be detected independent of region size by utilizing a multiscale network structure. Experiments show that SuperCNN can robustly detect salient objects and outperforms the state-of-the-art methods on three benchmark datasets.

[1]  Chengyao Shen Learning High-Level Concepts by Training A Deep Network on Eye Fixations , 2012 .

[2]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[3]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[4]  Jürgen Schmidhuber,et al.  A committee of neural networks for traffic sign classification , 2011, The 2011 International Joint Conference on Neural Networks.

[5]  Li Xu,et al.  Hierarchical Saliency Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[7]  HongJiang Zhang,et al.  Contrast-based image attention analysis by using fuzzy growing , 2003, MULTIMEDIA '03.

[8]  Yann LeCun,et al.  Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[9]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[10]  P. König,et al.  Does luminance‐contrast contribute to a saliency map for overt visual attention? , 2003, The European journal of neuroscience.

[11]  Peng Jiang,et al.  Salient Region Detection by UFO: Uniqueness, Focusness and Objectness , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Rynson W. H. Lau,et al.  Saliency Detection with Flash and No-flash Image Pairs , 2014, ECCV.

[13]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[14]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  S. Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, CVPR 2009.

[16]  Nanning Zheng,et al.  Automatic salient object segmentation based on context and shape prior , 2011, BMVC.

[17]  Lihi Zelnik-Manor,et al.  What Makes a Patch Distinct? , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[20]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Yael Pritch,et al.  Saliency filters: Contrast based filtering for salient region detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[23]  Lihi Zelnik-Manor,et al.  Context-Aware Saliency Detection , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[25]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Vibhav Vineet,et al.  Efficient Salient Region Detection with Soft Image Abstraction , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Ariel Shamir,et al.  Seam Carving for Content-Aware Image Resizing , 2007, ACM Trans. Graph..

[28]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  P. Cavanagh,et al.  The Spatial Resolution of Visual Attention , 2001, Cognitive Psychology.

[30]  Antoni B. Chan,et al.  Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[31]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[34]  C. Frith,et al.  Directing attention to locations and to sensory modalities: multiple levels of selective processing revealed with PET. , 2002, Cerebral cortex.

[35]  Derrick J. Parkhurst,et al.  Modeling the role of salience in the allocation of overt visual attention , 2002, Vision Research.

[36]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Parsing , 2013, ArXiv.

[37]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Ali Borji,et al.  Boosting bottom-up and top-down visual features for saliency estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Yehuda Koren,et al.  Lessons from the Netflix prize challenge , 2007, SKDD.

[40]  Holger Winnemöller,et al.  XDoG: An eXtended difference-of-Gaussians compendium including advanced image stylization , 2012, Comput. Graph..

[41]  Jingdong Wang,et al.  Salient Object Detection: A Discriminative Regional Feature Integration Approach , 2013, International Journal of Computer Vision.

[42]  Gabriela Csurka,et al.  A framework for visual saliency detection with applications to image thumbnailing , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[43]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[44]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[45]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Yao Lu,et al.  Learning attention map from images , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Alexander Toet,et al.  Computational versus Psychophysical Bottom-Up Image Saliency: A Comparative Evaluation Study , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Umar Mohammed,et al.  Superpixel lattices , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[50]  Huchuan Lu,et al.  Saliency Detection via Graph-Based Manifold Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.