Learning to Detect Salient Objects with Image-Level Supervision

Deep Neural Networks (DNNs) have substantially improved the state-of-the-art in salient object detection. However, training DNNs requires costly pixel-level annotations. In this paper, we leverage the observation that image-level tags provide important cues of foreground salient objects, and develop a weakly supervised learning method for saliency detection using image-level tags only. The Foreground Inference Network (FIN) is introduced for this challenging task. In the first stage of our training method, FIN is jointly trained with a fully convolutional network (FCN) for image-level tag prediction. A global smooth pooling layer is proposed, enabling FCN to assign object category tags to corresponding object regions, while FIN is capable of capturing all potential foreground regions with the predicted saliency maps. In the second stage, FIN is fine-tuned with its predicted saliency maps as ground truth. For refinement of ground truth, an iterative Conditional Random Field is developed to enforce spatial label consistency and further boost performance. Our method alleviates annotation efforts and allows the usage of existing large scale training sets with image-level tags. Our model runs at 60 FPS, outperforms unsupervised ones with a large margin, and achieves comparable or even superior performance than fully supervised counterparts.

[1]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[2]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[3]  Nuno Vasconcelos,et al.  Learning Optimal Seeds for Diffusion-Based Salient Object Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Jingdong Wang,et al.  Salient Object Detection: A Discriminative Regional Feature Integration Approach , 2013, International Journal of Computer Vision.

[5]  Li Xu,et al.  Hierarchical Saliency Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[7]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[9]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[10]  Huchuan Lu,et al.  Saliency Detection via Graph-Based Manifold Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Huchuan Lu,et al.  Saliency Detection via Dense and Sparse Reconstruction , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Ming-Hsuan Yang,et al.  DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection , 2015, IEEE Transactions on Image Processing.

[15]  Huchuan Lu,et al.  Saliency Detection with Recurrent Fully Convolutional Networks , 2016, ECCV.

[16]  Yizhou Yu,et al.  Visual saliency based on multiscale deep features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Trevor Darrell,et al.  Do Convnets Learn Correspondence? , 2014, NIPS.

[18]  Huchuan Lu,et al.  Saliency detection via Cellular Automata , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Ronan Collobert,et al.  From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Jian Sun,et al.  Saliency Optimization from Robust Background Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Yu-Wing Tai,et al.  Salient Region Detection via High-Dimensional Color Transform , 2014, CVPR.

[25]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[26]  Ronen Basri,et al.  Image Segmentation by Probabilistic Bottom-Up Aggregation and Cue Integration , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Xiaogang Wang,et al.  Saliency detection by multi-context deep learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Xiaogang Wang,et al.  STCT: Sequentially Training Convolutional Networks for Visual Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Deyu Meng,et al.  Co-Saliency Detection via a Self-Paced Multiple-Instance Learning Framework , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Huchuan Lu,et al.  Deep networks for saliency detection via local estimation and global search , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yizhou Yu,et al.  Deep Contrast Learning for Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Shi-Min Hu,et al.  SalientShape: group saliency in image collections , 2013, The Visual Computer.

[34]  Xiaogang Wang,et al.  Visual Tracking with Fully Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Yueting Zhuang,et al.  DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection , 2015, IEEE Transactions on Image Processing.

[36]  Ali Borji,et al.  Salient object detection: A survey , 2014, Computational Visual Media.

[37]  Radomír Mech,et al.  Minimum Barrier Salient Object Detection at 80 FPS , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Trevor Darrell,et al.  Fully Convolutional Multi-Class Multiple Instance Learning , 2014, ICLR.

[39]  Gang Wang,et al.  Recurrent Attentional Networks for Saliency Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Huaizu Jiang,et al.  Weakly Supervised Learning for Salient Object Detection , 2015 .

[41]  Bharath Hariharan,et al.  Low-shot visual object recognition , 2016, ArXiv.

[42]  Xiaochun Cao,et al.  Cluster-Based Co-Saliency Detection , 2013, IEEE Transactions on Image Processing.

[43]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[44]  Junwei Han,et al.  DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[47]  Jian Sun,et al.  Convolutional feature masking for joint object and stuff segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Huchuan Lu,et al.  Pattern Mining Saliency , 2016, ECCV.

[49]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[50]  James M. Rehg,et al.  The Secrets of Salient Object Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  S. Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, CVPR 2009.

[52]  Christoph H. Lampert,et al.  Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation , 2016, ECCV.

[53]  Xuelong Li,et al.  DISC: Deep Image Saliency Computing via Progressive Representation Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[54]  Gayoung Lee,et al.  Deep Saliency with Encoded Low Level Distance Map and High Level Features , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[56]  Zaïd Harchaoui,et al.  On learning to localize objects with minimal supervision , 2014, ICML.

[57]  Nikos Komodakis,et al.  HARF: Hierarchy-Associated Rich Features for Salient Object Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[58]  Bernt Schiele,et al.  Weakly Supervised Object Boundaries , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Trevor Darrell,et al.  Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).