Semantic Cross-View Matching

Matching cross-view images is challenging because the appearance and viewpoints are significantly different. While low-level features based on gradient orientations or filter responses can drastically vary with such changes in viewpoint, semantic information of images however shows an invariant characteristic in this respect. Consequently, semantically labeled regions can be used for performing cross-view matching. In this paper, we therefore explore this idea and propose an automatic method for detecting and representing the semantic information of an RGB image with the goal of performing cross-view matching with a (non-RGB) geographic information system (GIS). A segmented image forms the input to our system with segments assigned to semantic concepts such as traffic signs, lakes, roads, foliage, etc. We design a descriptor to robustly capture both, the presence of semantic concepts and the spatial layout of those segments. Pairwise distances between the descriptors extracted from the GIS map and the query image are then used to generate a shortlist of the most promising locations with similar semantic concepts in a consistent spatial layout. An experimental evaluation with challenging query images and a large urban area shows promising results.

[1]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[2]  Frank P. Ferrie,et al.  A Note on Metric Properties for Some Divergence Measures: The Gaussian Case , 2012, ACML.

[3]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[4]  Matthieu Cord,et al.  Semantic pooling for image categorization using multiple kernel learning , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[5]  Richard Szeliski,et al.  Alignment of 3D point clouds to overhead images , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Christian Früh,et al.  Constructing 3D city models by merging ground-based and airborne views , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[7]  Jitendra Malik,et al.  Shape Context: A New Descriptor for Shape Matching and Object Recognition , 2000, NIPS.

[8]  Dieter Fox,et al.  RGB-(D) scene labeling: Features and algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Pascal Vasseur,et al.  Globally optimal line clustering and vanishing point estimation in Manhattan world , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[11]  T. Hughes,et al.  Signals and systems , 2006, Genome Biology.

[12]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[13]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Mubarak Shah,et al.  GIS-Assisted Object Detection and Geospatial Localization , 2014, ECCV.

[15]  Marc Pollefeys,et al.  Large Scale Visual Geo-Localization of Images in Mountainous Terrain , 2012, ECCV.

[16]  Mayank Bansal,et al.  Ultra-wide Baseline Facade Matching for Geo-localization , 2012, ECCV Workshops.

[17]  Vincent Lepetit,et al.  A fast local descriptor for dense matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Luc Van Gool,et al.  Wide-baseline stereo matching with line segments , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Luc Van Gool,et al.  Scale-invariant line descriptors for wide baseline matching , 2014, IEEE Winter Conference on Applications of Computer Vision.

[21]  Nikolas P. Galatsanos,et al.  An Analytic Distance Metric for Gaussian Mixture Models with Application in Image Retrieval , 2005, ICANN.

[22]  Matthew Brand,et al.  SKYLINE2GPS: Localization in urban canyons using omni-skylines , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[24]  Ashutosh Saxena,et al.  Make3D: Learning 3D Scene Structure from a Single Still Image , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[27]  Mayank Bansal,et al.  Ultra-wide Baseline Facade Matching for Geo-localization , 2012, ECCV Workshops.

[28]  Serge J. Belongie,et al.  Cross-View Image Geolocalization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Steven M. Seitz,et al.  Accurate Geo-Registration by Ground-to-Aerial Image Matching , 2014, 2014 2nd International Conference on 3D Vision.

[30]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[31]  James H. Elder,et al.  Efficient Edge-Based Methods for Estimating Manhattan Frames in Urban Imagery , 2008, ECCV.

[32]  Marc Pollefeys,et al.  Leveraging 3D City Models for Rotation Invariant Place-of-Interest Recognition , 2011, International Journal of Computer Vision.