A Generalized Multi-Task Learning Approach to Stereo DSM Filtering in Urban Areas

City models and height maps of urban areas serve as a valuable data source for numerous applications, such as disaster management or city planning. While this information is not globally available, it can be substituted by digital surface models (DSMs), automatically produced from inexpensive satellite imagery. However, stereo DSMs often suffer from noise and blur. Furthermore, they are heavily distorted by vegetation, which is of lesser relevance for most applications. Such basic models can be filtered by convolutional neural networks (CNNs), trained on labels derived from digital elevation models (DEMs) and 3D city models, in order to obtain a refined DSM. We propose a modular multi-task learning concept that consolidates existing approaches into a generalized framework. Our encoder-decoder models with shared encoders and multiple task-specific decoders leverage roof type classification as a secondary task and multiple objectives including a conditional adversarial term. The contributing single-objective losses are automatically weighted in the final multi-task loss function based on learned uncertainty estimates. We evaluated the performance of specific instances of this family of network architectures. Our method consistently outperforms the state of the art on common data, both quantitatively and qualitatively, and generalizes well to a new dataset of an independent study area.

[1]  Senthil Yogamani,et al.  AuxNet: Auxiliary tasks enhanced Semantic Segmentation for Automated Driving , 2019, VISIGRAPP.

[2]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[3]  Jeffrey P. Walker,et al.  A comparative study of Australian cartometric and photogrammetric digital elevation model accuracy , 2006 .

[4]  Peter Reinartz,et al.  Detecting complex building shapes in panchromatic satellite images for digital elevation model enhancement , 2010 .

[5]  Vladlen Koltun,et al.  Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.

[6]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[7]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Marco Körner,et al.  Automatic Large-Scale 3D Building Shape Refinement Using Conditional Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[10]  N. Pfeifer,et al.  SEGMENTATION BASED ROBUST INTERPOLATION - A NEW APPROACH TO LASER DATA FILTERING , 2005 .

[11]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Thomas H. Kolbe,et al.  Representing and Exchanging 3D City Models with CityGML , 2009 .

[14]  Jo Wood,et al.  Spectral filtering as a method of visualising and removing striped artefacts in digital elevation data , 2008 .

[15]  Jonathan Richard Shewchuk,et al.  Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator , 1996, WACG.

[16]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Marco Körner,et al.  MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and Classification , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[18]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Friedrich Fraundorfer,et al.  Multi-Task cGAN for Simultaneous Spaceborne DSM Refinement and Roof-Type Classification , 2019, Remote. Sens..

[20]  Ying Wu,et al.  A Modulation Module for Multi-task Learning with Applications in Image Retrieval , 2018, ECCV.

[21]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[23]  Marco Körner,et al.  Auxiliary Tasks in Multi-task Learning , 2018, ArXiv.

[24]  Yu Cheng,et al.  Fully-Adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Marco Körner,et al.  DSM-to-LoD2: Spaceborne Stereo Digital Surface Model Refinement , 2018, Remote. Sens..

[26]  Takayuki Okatani,et al.  Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps With Accurate Object Boundaries , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[27]  P. Reinartz,et al.  Semiglobal Matching Results on the ISPRS Stereo Matching Benchmark , 2012 .

[28]  Leonidas J. Guibas,et al.  Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Naoto Yokoya,et al.  IMG2DSM: Height Simulation From Single Imagery Using Conditional Generative Adversarial Net , 2018, IEEE Geoscience and Remote Sensing Letters.

[30]  Peter Reinartz,et al.  Enhancment of dense urban digital surface models from VHR optical satellite stereo data by pre-segmentation and object detection , 2010 .

[31]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Li Fei-Fei,et al.  Dynamic Task Prioritization for Multitask Learning , 2018, ECCV.

[33]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[34]  S. Keesstra,et al.  Comparing Filtering Techniques for Removing Vegetation from UAV-Based Photogrammetric Point Clouds , 2019, Drones.

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Peter Reinartz,et al.  ENHANCING URBAN DIGITAL ELEVATION MODELS USING AUTOMATED COMPUTER VISION TECHNIQUES , 2010 .

[37]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[39]  Frédéric Champagnat,et al.  On Regression Losses for Deep Depth Estimation , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[40]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[41]  K. Jacobsen,et al.  SEGMENTED FILTERING OF LASER SCANNER DSMS , 2003 .

[42]  Uwe Weidner Digital Surface Models for Building Extraction , 1997 .

[43]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Ping Wang,et al.  APPLYING TWO DIMENSIONAL KALMAN FILTERING FOR DIGITAL TERRAIN MODELLING , 2003 .

[45]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[46]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[47]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[49]  Mahmoud Salah,et al.  Filtering of remote sensing point clouds using fuzzy C-means clustering , 2020 .