Empty Cities: Image Inpainting for a Dynamic-Object-Invariant Space

In this paper we present an end-to-end deep learning framework to turn images that show dynamic content, such as vehicles or pedestrians, into realistic static frames. This objective encounters two main challenges: detecting all the dynamic objects, and inpainting the static occluded background with plausible imagery. The former challenge is addressed by the use of a convolutional network that learns a multiclass semantic segmentation of the image. The second problem is approached with a conditional generative adversarial model that, taking as input the original dynamic image and its dynamic/static binary mask, is capable of generating the final static image. These generated images can be used for applications such as augmented reality or vision-based robot localization purposes. To validate our approach, we show both qualitative and quantitative comparisons against other state-of-the-art inpainting methods by removing the dynamic objects and hallucinating the static structure behind them. Furthermore, to demonstrate the potential of our results, we carry out pilot experiments that show the benefits of our proposal for visual place recognition 1.1All our code has been made available on https://github.com/bertabescos/EmptyCities.

[1]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[2]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Andrea Vedaldi,et al.  Deep Image Prior , 2017, International Journal of Computer Vision.

[4]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Jon Gauthier Conditional generative adversarial nets for convolutional face generation , 2015 .

[6]  Roland Siegwart,et al.  Dynamic Objects Segmentation for Visual Localization in Urban Environments , 2018, ArXiv.

[7]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[8]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[9]  Jan Kautz,et al.  Background Inpainting for Videos with Dynamic Objects and a Free-Moving Camera , 2012, ECCV.

[10]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yuxiang Sun,et al.  Motion removal for reliable RGB-D SLAM in dynamic environments , 2018, Robotics Auton. Syst..

[12]  Daniel Cremers,et al.  StaticFusion: Background Reconstruction for Dense RGB-D SLAM in Dynamic Environments , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Peter I. Corke,et al.  High-fidelity simulation for evaluating robotic vision performance , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[17]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[18]  Ting-Chun Wang,et al.  Image Inpainting for Irregular Holes Using Partial Convolutions , 2018, ECCV.

[19]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Hao Li,et al.  High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Luis Miguel Bergasa,et al.  On combining visual SLAM and dense scene flow to increase the robustness of localization and mapping in dynamic environments , 2012, 2012 IEEE International Conference on Robotics and Automation.

[22]  Javier Civera,et al.  Single-View Place Recognition under Seasonal Changes , 2018, ArXiv.

[23]  Xueting Li,et al.  A Closed-form Solution to Photorealistic Image Stylization , 2018, ECCV.

[24]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[25]  W. Marsden I and J , 2012 .

[26]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[27]  Eduardo Romera,et al.  ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation , 2018, IEEE Transactions on Intelligent Transportation Systems.

[28]  Atsuto Maki,et al.  Towards a simulation driven stereo vision system , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[29]  Shoudong Huang,et al.  Motion segmentation based robust RGB-D SLAM , 2014, Proceeding of the 11th World Congress on Intelligent Control and Automation.

[30]  Alexei A. Efros,et al.  Image quilting for texture synthesis and transfer , 2001, SIGGRAPH.

[31]  Qin Huang,et al.  Image Inpainting using Multi-Scale Feature Image Translation , 2017, ArXiv.

[32]  J. Koenderink Q… , 2014, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.

[33]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[34]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ries Uittenbogaard Moving object detection and image inpainting in street-view imagery , 2018 .

[36]  Hiroshi Ishikawa,et al.  Let there be color! , 2016, ACM Trans. Graph..

[37]  Javier Civera,et al.  DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes , 2018, IEEE Robotics and Automation Letters.

[38]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[39]  Ingmar Posner,et al.  Driven to Distraction: Self-Supervised Distractor Learning for Robust Monocular Visual Odometry in Urban Environments , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[42]  Alexandru Telea,et al.  An Image Inpainting Technique Based on the Fast Marching Method , 2004, J. Graphics, GPU, & Game Tools.

[43]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).