Common Landmark Discovery for Object-Level View Image Retrieval: Modeling and Matching of Scenes via Bag-of-Bounding-Boxes

Object-level view image retrieval for robot vision applications has been actively studied recently, as they can provide semantic and compact method for efficient scene matching. In existing frameworks, landmark objects are extracted from an input view image by a pool of pretrained object detectors, and used as an image representation. To improve the compactness and autonomy of object-level view image retrieval, we here present a novel method called ``common landmark discovery". Under this method, landmark objects are mined through common pattern discovery (CPD) between an input image and known reference images. This approach has three distinct advantages. First, the CPD-based object detection is unsupervised, and does not require pretrained object detector. Second, the method attempts to find fewer and larger object patterns, which leads to a compact and semantically robust view image descriptor. Third, the scene matching problem is efficiently solved as a lower-dimensional problem of computing region overlaps between landmark objects, using a compact image representation in a form of bag-of-bounding-boxes (BoBB).

[1]  Michal Perdoch,et al.  Efficient sequential correspondence selection by cosegmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Andrew Zisserman,et al.  Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[3]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[4]  Takeo Kanade,et al.  Distributed cosegmentation via submodular optimization on anisotropic diffusion , 2011, 2011 International Conference on Computer Vision.

[5]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[6]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Yuning Jiang,et al.  Randomized visual phrases for object search , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[9]  Hung-Khoon Tan,et al.  Common pattern discovery using earth mover's distance and local flow maximization , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Javier Civera,et al.  Towards semantic SLAM using a monocular camera , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Dieter Fox,et al.  Unsupervised Feature Learning for RGB-D Based Object Recognition , 2012, ISER.

[12]  Svetlana Lazebnik,et al.  Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[13]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Tanaka Kanji,et al.  PartSLAM: Unsupervised part-based scene modeling for fast succinct map matching , 2013, IROS 2013.

[17]  Pedro F. Felzenszwalb,et al.  Reconfigurable models for scene recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Michael Isard,et al.  General Theory , 1969 .

[19]  Masatoshi Ando,et al.  Common Landmark Discovery in Urban Scenes , 2013, MVA.