Fusing Saliency Maps with Region Proposals for Unsupervised Object Localization

In this paper we address the problem of unsupervised localization of objects in single images. Compared to previous state-of-the-art method our method is fully unsupervised in the sense that there is no prior instance level or category level information about the image. Furthermore, we treat each image individually and do not rely on any neighboring image similarity. We employ deep-learning based generation of saliency maps and region proposals to tackle this problem. First salient regions in the image are determined using an encoder/decoder architecture. The resulting saliency map is matched with region proposals from a class agnostic region proposal network to roughly localize the candidate object regions. These regions are further refined based on the overlap and similarity ratios. Our experimental evaluations on a benchmark dataset show that the method gets close to current state-of-the-art methods in terms of localization accuracy even though these make use of multiple frames. Furthermore, we created a more challenging and realistic dataset with multiple object categories and varying viewpoint and illumination conditions for evaluating the method's performance in real world scenarios.

[1]  Thomas Deselaers,et al.  Localizing Objects While Learning Their Appearance , 2010, ECCV.

[2]  Matthew B. Blaschko,et al.  Unsupervised Spatio-Temporal Segmentation with Sparse Spectral-Clustering , 2014, BMVC.

[3]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[4]  Ce Liu,et al.  Unsupervised Joint Object Discovery and Segmentation in Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[6]  Cordelia Schmid,et al.  Multi-fold MIL Training for Weakly Supervised Object Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Markus Vincze,et al.  Saliency-based object discovery on RGB-D data with a late-fusion approach , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Gert Kootstra,et al.  Using Symmetry to Select Fixation Points for Segmentation , 2010, 2010 20th International Conference on Pattern Recognition.

[10]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[12]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Fei-Fei Li,et al.  Efficient Image and Video Co-localization with Frank-Wolfe Algorithm , 2014, ECCV.

[14]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[15]  Nikolaus Correll,et al.  Simultaneous localization, mapping, and manipulation for unsupervised object discovery , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Sergio Caccamo,et al.  3D Object Discovery and Modeling Using Single RGB-D Images Containing Multiple Object Instances , 2017, 2017 International Conference on 3D Vision (3DV).

[18]  Cordelia Schmid,et al.  Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Simone Frintrop,et al.  Sequence-level object candidates based on saliency for generic object recognition on mobile systems , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Danica Kragic,et al.  Active 3D scene segmentation and detection of unknown objects , 2010, 2010 IEEE International Conference on Robotics and Automation.

[21]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[22]  Gert Kootstra,et al.  Fast and Automatic Detection and Segmentation of unknown objects , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[23]  Thomas A. Funkhouser,et al.  Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[25]  Gert Kootstra,et al.  Fast and bottom-up object detection, segmentation, and evaluation using Gestalt principles , 2011, 2011 IEEE International Conference on Robotics and Automation.

[26]  Markus Vincze,et al.  Attention-driven object detection and segmentation of cluttered table scenes using 2.5D symmetry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[28]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Loong Fah Cheong,et al.  Active segmentation with fixation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[30]  Michael Isard,et al.  Object localization by Bayesian correlation , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.