Toward Seamless Multiview Scene Analysis From Satellite to Street Level

In this paper, we discuss and review how combined multiview imagery from satellite to street level can benefit scene analysis. Numerous works exist that merge information from remote sensing and images acquired from the ground for tasks such as object detection, robots guidance, or scene understanding. What makes the combination of overhead and street-level images challenging are the strongly varying viewpoints, the different scales of the images, their illuminations and sensor modality, and time of acquisition. Direct (dense) matching of images on a per-pixel basis is thus often impossible, and one has to resort to alternative strategies that will be discussed in this paper. For such purpose, we review recent works that attempt to combine images taken from the ground and overhead views for purposes like scene registration, reconstruction, or classification. After the theoretical review, we present three recent methods to showcase the interest and potential impact of such fusion on real applications (change detection, image orientation, and tree cataloging), whose logic can then be reused to extend the use of ground-based images in remote sensing and vice versa. Through this review, we advocate that cross fertilization between remote sensing, computer vision, and machine learning is very valuable to make the best of geographic data available from Earth observation sensors and ground imagery. Despite its challenges, we believe that integrating these complementary data sources will lead to major breakthroughs in Big GeoData. It will open new perspectives for this exciting and emerging field.

[1]  Steven M. Seitz,et al.  Accurate Geo-Registration by Ground-to-Aerial Image Matching , 2014, 2014 2nd International Conference on 3D Vision.

[2]  Wen Liu,et al.  Urban monitoring and change detection of central Tokyo using high-resolution X-band SAR images , 2011, 2011 IEEE International Geoscience and Remote Sensing Symposium.

[3]  Colin Fyfe,et al.  Kernel and Nonlinear Canonical Correlation Analysis , 2000, IJCNN.

[4]  Gökhan BakIr,et al.  Predicting Structured Data , 2008 .

[5]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Liang-Tien Chia,et al.  Estimating camera pose from a single urban ground-view omnidirectional image and a 2D building outline map , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  David J. Crandall,et al.  Mining photo-sharing websites to study ecological phenomena , 2012, WWW.

[8]  Stefan Lee,et al.  Predicting Geo-informative Attributes in Large-Scale Image Collections Using Convolutional Neural Networks , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[9]  Marco Fiocco,et al.  Multisensor fusion for volumetric reconstruction of large outdoor areas , 2005, Fifth International Conference on 3-D Digital Imaging and Modeling (3DIM'05).

[10]  Lucy Bastin,et al.  Usability of VGI for validation of land cover maps , 2015, Int. J. Geogr. Inf. Sci..

[11]  Giles M. Foody,et al.  Using Volunteered Data in Land Cover Map Validation: Mapping West African Forests , 2013, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[12]  E. Gregory McPherson,et al.  Structure, function and value of street trees in California, USA , 2016 .

[13]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[15]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[16]  Christian Früh,et al.  Constructing 3D city models by merging ground-based and airborne views , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[17]  Daniel Huber,et al.  Vision based robot localization by ground to satellite matching in GPS-denied situations , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Luc Vincent,et al.  Taking Online Maps Down to Street Level , 2007, Computer.

[19]  Huanxin Zou,et al.  Unsupervised Cross-View Semantic Transfer for Remote Sensing Image Classification , 2016, IEEE Geoscience and Remote Sensing Letters.

[20]  Alexei A. Efros,et al.  What makes Paris look like Paris? , 2015, Commun. ACM.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[23]  Bolei Zhou,et al.  Recognizing City Identity via Attribute Analysis of Geo-tagged Images , 2014, ECCV.

[24]  Noah Snavely,et al.  Scene Chronology , 2014, ECCV.

[25]  Serge J. Belongie,et al.  Learning deep representations for ground-to-aerial geolocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Timothée Produit,et al.  Do geographic features impact pictures location shared on the Web? Modeling photographic suitability in the Swiss Alps , 2014, EMR@ICMR.

[27]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[28]  Hanyun Wang,et al.  Learn Multiple-Kernel SVMs for Domain Adaptation in Hyperspectral Data , 2013, IEEE Geoscience and Remote Sensing Letters.

[29]  Barbara Koch,et al.  Investigating multiple data sources for tree species classification in temperate forest and use for single tree delineation , 2012, Int. J. Appl. Earth Obs. Geoinformation.

[30]  Jean-Philippe Domenger,et al.  Semi-structured document image matching and recognition , 2013, Electronic Imaging.

[31]  Marc Pollefeys,et al.  Large Scale Visual Geo-Localization of Images in Mountainous Terrain , 2012, ECCV.

[32]  David J. Crandall,et al.  Observing the Natural World with Flickr , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[33]  Alexei A. Efros,et al.  City Forensics: Using Visual Elements to Predict Non-Visual City Attributes , 2014, IEEE Transactions on Visualization and Computer Graphics.

[34]  Krista A. Ehinger,et al.  Recognizing scene viewpoint using panoramic place representation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Hans-Peter Seidel,et al.  Automatic photo-to-terrain alignment for the annotation of mountain pictures , 2011, CVPR 2011.

[36]  Luc Van Gool,et al.  Efficient volumetric fusion of airborne and street-side data for urban reconstruction , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[37]  Pietro Perona,et al.  Cataloging Public Objects Using Aerial and Street-Level Images — Urban Trees , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Keith Yu Kit Leung,et al.  Localization in urban environments by matching ground level video images with an aerial image , 2008, 2008 IEEE International Conference on Robotics and Automation.

[39]  Christian Früh,et al.  Constructing 3D City Models by Merging Aerial and Ground Views , 2003, IEEE Computer Graphics and Applications.

[40]  Mayank Bansal,et al.  Ultra-wide Baseline Facade Matching for Geo-localization , 2012, ECCV Workshops.

[41]  Christian Früh,et al.  Reconstructuring 3D City Models by Merging Ground-Based and Airborne Views , 2003, VLBV.

[42]  E. Baltsavias,et al.  Semi-automatic classification of tree species in different forest ecosystems by spectral and geometric variables derived from Airborne Digital Sensor (ADS40) and RC30 data , 2011 .

[43]  Serge J. Belongie,et al.  Cross-View Image Geolocalization , 2013, CVPR.

[44]  Jacinto Estima,et al.  Flickr Geotagged and Publicly Available Photos: Preliminary Study of Its Adequacy for Helping Quality Control of Corine Land Cover , 2013, ICCSA.

[45]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[46]  K. Kraus Photogrammetry: Geometry from Images and Laser Scans , 2007 .

[47]  Sanja Fidler,et al.  HD Maps: Fine-Grained Road Segmentation by Parsing Ground and Aerial Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Lina J. Karam,et al.  Change detection on SAR images by a parametric estimation of the KL-divergence between Gaussian Mixture Models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[49]  Shawn D. Newsam,et al.  Land cover classification using geo-referenced photos , 2014, Multimedia Tools and Applications.

[50]  Sébastien Lefèvre,et al.  Coupling ground-level panoramas and aerial imagery for change detection , 2016, Geo spatial Inf. Sci..

[51]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[52]  James Hays,et al.  Localizing and Orienting Street Views Using Overhead Imagery , 2016, ECCV.

[53]  Gabriele Moser,et al.  Multimodal Classification of Remote Sensing Images: A Review and Future Directions , 2015, Proceedings of the IEEE.

[54]  D. Leckie,et al.  Automated tree recognition in old growth conifer stands with high resolution digital imagery , 2005 .

[55]  Serge J. Belongie,et al.  Cross-View Image Geolocalization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[57]  Devis Tuia,et al.  Pose Estimation of Web-Shared Landscape Pictures , 2014 .

[58]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Marwan Hussein,et al.  Matching of ground-based LiDAR and aerial image data for mobile robot localization in densely forested environments , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[60]  Shawn D. Newsam,et al.  Proximate sensing: Inferring what-is-where from georeferenced photo collections , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[61]  Mayank Bansal,et al.  Geometric Urban Geo-localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[63]  Claudio Andreatta,et al.  Spatial and Temporal Attractiveness Analysis Through Geo-Referenced Photo Alignment , 2008, IGARSS 2008 - 2008 IEEE International Geoscience and Remote Sensing Symposium.

[64]  Jiebo Luo,et al.  Geotagging in multimedia and computer vision—a survey , 2010, Multimedia Tools and Applications.

[65]  Jean-Yves Tourneret,et al.  Maximum-likelihood estimation of the polarization degree from two multi-look intensity images , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[66]  Scott Workman,et al.  On the location dependence of convolutional neural network features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[67]  Morten Andreas Dahl Larsen,et al.  Comparison of six individual tree crown detection algorithms evaluated under varying forest conditions , 2011 .

[68]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[69]  Jiebo Luo,et al.  Event recognition: viewing the world with a third eye , 2008, ACM Multimedia.

[70]  Hui Cheng,et al.  Geo-localization of street views with aerial image databases , 2011, ACM Multimedia.

[71]  Juha Hyyppä,et al.  An International Comparison of Individual Tree Detection and Extraction Using Airborne Laser Scanning , 2012, Remote. Sens..

[72]  M. Painho,et al.  Comparative study of Land Use/Cover classification using Flickr photos, satellite imagery and Corine Land Cover database , 2014 .

[73]  Richard Szeliski,et al.  Alignment of 3D point clouds to overhead images , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[74]  Robert H. Webb Repeat Photography: Methods and Applications in the Natural Sciences , 2010 .

[75]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[76]  Riad I. Hammoud,et al.  Overhead-Based Image and Video Geo-localization Framework , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[77]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[78]  Eija Honkavaara,et al.  An SVM Classification of Tree Species Radiometric Signatures Based on the Leica ADS40 Sensor , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[79]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  Scott Workman,et al.  Wide-Area Image Geolocalization with Aerial Reference Imagery , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[81]  Lorenzo Bruzzone,et al.  Domain Adaptation for the Classification of Remote Sensing Data: An Overview of Recent Advances , 2016, IEEE Geoscience and Remote Sensing Magazine.

[82]  R. Pu,et al.  A comparative analysis of high spatial resolution IKONOS and WorldView-2 imagery for mapping urban tree species , 2012 .

[83]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[84]  Eija Honkavaara,et al.  Variation and directional anisotropy of reflectance at the crown scale — Implications for tree species classification in digital aerial images , 2011 .