Weakly Supervised Top-down Salient Object Detection

Top-down saliency models produce a probability map that peaks at target locations specified by a task/goal such as object detection. They are usually trained in a fully supervised setting involving pixel-level annotations of objects. We propose a weakly supervised top-down saliency framework using only binary labels that indicate the presence/absence of an object in an image. First, the probabilistic contribution of each image region to the confidence of a CNN-based image classifier is computed through a backtracking strategy to produce top-down saliency. From a set of saliency maps of an image produced by fast bottom-up saliency approaches, we select the best saliency map suitable for the top-down task. The selected bottom-up saliency map is combined with the top-down saliency map. Features having high combined saliency are used to train a linear SVM classifier to estimate feature saliency. This is integrated with combined saliency and further refined through a multi-scale superpixel-averaging of saliency map. We evaluate the performance of the proposed weakly supervised top-down saliency against fully supervised approaches and achieve state-of-the-art performance. Experiments are carried out on seven challenging datasets and quantitative results are compared with 36 closely related approaches across 4 different applications.

[1]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[3]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Takeo Kanade,et al.  Distributed cosegmentation via submodular optimization on anisotropic diffusion , 2011, 2011 International Conference on Computer Vision.

[5]  James M. Rehg,et al.  The Secrets of Salient Object Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[8]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[10]  Huchuan Lu,et al.  Saliency Detection via Graph-Based Manifold Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Ming-Hsuan Yang,et al.  Top-down visual saliency via joint CRF and dictionary learning , 2012, CVPR.

[12]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Natasha Gelfand,et al.  A survey of image retargeting techniques , 2010, Optical Engineering + Applications.

[14]  Jean Ponce,et al.  Discriminative clustering for image co-segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Peter Auer,et al.  Generic object recognition with boosting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Nuno Vasconcelos,et al.  Discriminant Saliency, the Detection of Suspicious Coincidences, and Applications to Visual Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Ramón López de Mántaras,et al.  Fast and robust object segmentation with the Integral Linear Classifier , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Deepu Rajan,et al.  Top-down saliency with Locality-constrained Contextual Sparse Coding , 2015, BMVC.

[19]  Aykut Erdem,et al.  Top down saliency estimation via superpixel-based discriminative dictionaries , 2014, BMVC.

[20]  Junwei Han,et al.  DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Thomas Brox,et al.  Inverting Convolutional Networks with Convolutional Networks , 2015, ArXiv.

[22]  Matthew H Tong,et al.  SUN: Top-down saliency using natural statistics , 2009, Visual cognition.

[23]  Ce Liu,et al.  Unsupervised Joint Object Discovery and Segmentation in Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Rynson W. H. Lau,et al.  Exemplar-Driven Top-Down Saliency Detection via Deep Association , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Radomír Mech,et al.  Minimum Barrier Salient Object Detection at 80 FPS , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[28]  Trevor Darrell,et al.  Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[30]  Nazar Khan,et al.  Discriminative dictionary learning with spatial priors , 2013, 2013 IEEE International Conference on Image Processing.

[31]  Rui Zhang,et al.  Top-Down Saliency Detection via Contextual Pooling , 2014, J. Signal Process. Syst..

[32]  Moncef Gabbouj,et al.  Visual saliency by extended quantum cuts , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[33]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[34]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[36]  Huchuan Lu,et al.  Deep networks for saliency detection via local estimation and global search , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Xuelong Li,et al.  Detection of Co-salient Objects by Looking Deep and Wide , 2016, International Journal of Computer Vision.

[38]  Fei-Fei Li,et al.  Combining randomization and discrimination for fine-grained image categorization , 2011, CVPR 2011.

[39]  Stefano Soatto,et al.  Class segmentation and object localization with superpixel neighborhoods , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[40]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[43]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[44]  Trevor Darrell,et al.  Fully Convolutional Multi-Class Multiple Instance Learning , 2014, ICLR.

[45]  George Papandreou,et al.  Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[46]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Yizhou Yu,et al.  Visual saliency based on multiscale deep features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Shao-Yi Chien,et al.  Real-Time Salient Object Detection with a Minimum Spanning Tree , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Cordelia Schmid,et al.  Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[51]  Andrea Vedaldi,et al.  Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).