Understanding urban landuse from the above and ground perspectives: a deep learning, multimodal solution

Abstract Landuse characterization is important for urban planning. It is traditionally performed with field surveys or manual photo interpretation, two practices that are time-consuming and labor-intensive. Therefore, we aim to automate landuse mapping at the urban-object level with a deep learning approach based on data from multiple sources (or modalities). We consider two image modalities: overhead imagery from Google Maps and ensembles of ground-based pictures (side-views) per urban-object from Google Street View (GSV). These modalities bring complementary visual information pertaining to the urban-objects. We propose an end-to-end trainable model, which uses OpenStreetMap annotations as labels. The model can accommodate a variable number of GSV pictures for the ground-based branch and can also function in the absence of ground pictures at prediction time. We test the effectiveness of our model over the area of Ile-de-France, France, and test its generalization abilities on a set of urban-objects from the city of Nantes, France. Our proposed multimodal Convolutional Neural Network achieves considerably higher accuracies than methods that use a single image modality, making it suitable for automatic landuse map updates. Additionally, our approach could be easily scaled to multiple cities, because it is based on data sources available for many cities worldwide.

[1]  Nicolas Courty,et al.  Multiclass feature learning for hyperspectral image classification: sparse and hierarchical solutions , 2015, ArXiv.

[2]  Michael Isard,et al.  A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[3]  Gui-Song Xia,et al.  Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery , 2015, Remote. Sens..

[4]  S. Myint A Robust Texture Analysis and Classification Approach for Urban Land‐Use and Land‐Cover Feature Discrimination , 2001 .

[5]  Thomas Blaschke,et al.  Geographic Object-Based Image Analysis – Towards a new paradigm , 2014, ISPRS journal of photogrammetry and remote sensing : official publication of the International Society for Photogrammetry and Remote Sensing.

[6]  Frédo Durand,et al.  Burst Image Deblurring Using Permutation Invariant Convolutional Neural Networks , 2018, ECCV.

[7]  Xiao Xiang Zhu,et al.  Building Instance Classification Using Street View Images , 2018, ISPRS Journal of Photogrammetry and Remote Sensing.

[8]  William J. Emery,et al.  A neural network approach using multi-scale textural metrics from very high-resolution panchromatic imagery for urban land-use classification , 2009 .

[9]  Gustavo Camps-Valls,et al.  Semisupervised Manifold Alignment of Multimodal Remote Sensing Images , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[10]  Devis Tuia,et al.  Fine-grained landuse characterization using ground-based pictures: a deep learning solution based on globally available data , 2018, Int. J. Geogr. Inf. Sci..

[11]  Pietro Perona,et al.  Cataloging Public Objects Using Aerial and Street-Level Images — Urban Trees , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  David Morin,et al.  Operational High Resolution Land Cover Map Production at the Country Scale Using Satellite Image Time Series , 2017, Remote. Sens..

[13]  Christiane Schmullius,et al.  Object-based land cover mapping and comprehensive feature calculation for an automated derivation of urban structure types at block level , 2014 .

[14]  Bertrand Le Saux,et al.  Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks , 2016, ACCV.

[15]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[16]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[17]  G. Camps-Valls,et al.  Spectral alignment of multi-temporal cross-sensor images with automated kernel canonical correlation analysis , 2015 .

[18]  Knut Conradsen,et al.  Multivariate Alteration Detection (MAD) and MAF Postprocessing in Multispectral, Bitemporal Image Data: New Approaches to Change Detection Studies , 1998 .

[19]  Xiuwen Liu,et al.  A patch-based convolutional neural network for remote sensing image classification , 2017, Neural Networks.

[20]  Luisa Verdoliva,et al.  Land Use Classification in Remote Sensing Images by Convolutional Neural Networks , 2015, ArXiv.

[21]  Naoto Yokoya,et al.  Open Data for Global Multimodal Land Use Classification: Outcome of the 2017 IEEE GRSS Data Fusion Contest , 2018, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[22]  Peijun Du,et al.  A review of supervised object-based land-cover image classification , 2017 .

[23]  Xiao Xiang Zhu,et al.  Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources , 2017, IEEE Geoscience and Remote Sensing Magazine.

[24]  Yair Movshovitz-Attias,et al.  Ontological supervision for fine grained classification of Street View storefronts , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Shawn D. Newsam,et al.  Exploring Geotagged images for land-use classification , 2012, GeoMM '12.

[26]  Alexandre Boulch,et al.  Processing of Extremely High-Resolution LiDAR and RGB Data: Outcome of the 2015 IEEE GRSS Data Fusion Contest–Part A: 2-D Contest , 2016, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[27]  Iain Stewart,et al.  Mapping Local Climate Zones for a Worldwide Database of the Form and Function of Cities , 2015, ISPRS Int. J. Geo Inf..

[28]  Clément Mallet,et al.  INVESTIGATING THE POTENTIAL OF DEEP NEURAL NETWORKS FOR LARGE-SCALE CLASSIFICATION OF VERY HIGH RESOLUTION SATELLITE IMAGES , 2017 .

[29]  Gabriele Moser,et al.  Decision Fusion With Multiple Spatial Supports by Conditional Random Fields , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[30]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[31]  Michele Volpi,et al.  Dense Semantic Labeling of Subdecimeter Resolution Images With Convolutional Neural Networks , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[32]  Bo Huang,et al.  Urban land-use mapping using a deep convolutional neural network with high spatial resolution multispectral remote sensing imagery , 2018, Remote Sensing of Environment.

[33]  Gui-Song Xia,et al.  Large-Scale Land Cover Classification in Gaofen-2 Satellite Imagery , 2018, IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium.

[34]  T. Esch,et al.  Monitoring urbanization in mega cities from space , 2012 .

[35]  R. Weih,et al.  Comparison of Pixel-based versus Object-based Land Use/Land Cover ClassificationMethodologies , 2009 .

[36]  Jan Dirk Wegner,et al.  Toward Seamless Multiview Scene Analysis From Satellite to Street Level , 2017, Proceedings of the IEEE.

[37]  Jefersson Alex dos Santos,et al.  Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[38]  Gustau Camps-Valls,et al.  Multi-temporal and multi-source remote sensing image classification by nonlinear relative normalization , 2016, ArXiv.

[39]  Txomin Hermosilla,et al.  Assessing contextual descriptive features for plot-based classification of urban areas , 2012 .

[40]  Uwe Stilla,et al.  Classification With an Edge: Improving Semantic Image Segmentation with Boundary Detection , 2016, ISPRS Journal of Photogrammetry and Remote Sensing.

[41]  Lucy Bastin,et al.  Repurposing a deep learning network to filter and classify volunteered photographs for land cover and land use characterization , 2017, Geo spatial Inf. Sci..

[42]  Ramesh Raskar,et al.  Computer vision uncovers predictors of physical urban change , 2017, Proceedings of the National Academy of Sciences.

[43]  Suming Jin,et al.  Completion of the 2011 National Land Cover Database for the Conterminous United States – Representing a Decade of Land Cover Change Information , 2015 .

[44]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[45]  Michele Volpi,et al.  Deep multi-task learning for a geographically-regularized semantic segmentation of aerial images , 2018, ISPRS Journal of Photogrammetry and Remote Sensing.

[46]  Xin Pan,et al.  An object-based convolutional neural network (OCNN) for urban land use classification , 2018, Remote Sensing of Environment.

[47]  Yi Zhu,et al.  Land use classification using convolutional neural networks applied to ground-level images , 2015, SIGSPATIAL/GIS.

[48]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Scott Workman,et al.  A Unified Model for Near and Remote Sensing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[50]  Yi Zhu,et al.  Fine-Grained Land Use Classification at the City Scale Using Ground-Level Images , 2018, IEEE Transactions on Multimedia.

[51]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[52]  Jing Huang,et al.  DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[53]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.