Dynamic Objects Segmentation for Visual Localization in Urban Environments

Visual localization and mapping is a crucial capability to address many challenges in mobile robotics. It constitutes a robust, accurate and cost-effective approach for local and global pose estimation within prior maps. Yet, in highly dynamic environments, like crowded city streets, problems arise as major parts of the image can be covered by dynamic objects. Consequently, visual odometry pipelines often diverge and the localization systems malfunction as detected features are not consistent with the precomputed 3D model. In this work, we present an approach to automatically detect dynamic object instances to improve the robustness of vision-based localization and mapping in crowded environments. By training a convolutional neural network model with a combination of synthetic and real-world data, dynamic object instance masks are learned in a semi-supervised way. The real-world data can be collected with a standard camera and requires minimal further post-processing. Our experiments show that a wide range of dynamic objects can be reliably detected using the presented method. Promising performance is demonstrated on our own and also publicly available datasets, which also shows the generalization capabilities of this approach.

[1]  Wolfram Burgard,et al.  Socially compliant mobile robot navigation via inverse reinforcement learning , 2016, Int. J. Robotics Res..

[2]  Luis Miguel Bergasa,et al.  On combining visual SLAM and dense scene flow to increase the robustness of localization and mapping in dynamic environments , 2012, 2012 IEEE International Conference on Robotics and Automation.

[3]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jean Oh,et al.  Modeling cooperative navigation in dense human crowds , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Dongheui Lee,et al.  RGB-D SLAM in Dynamic Environments Using Static Point Weighting , 2017, IEEE Robotics and Automation Letters.

[6]  Javier Civera,et al.  DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes , 2018, IEEE Robotics and Automation Letters.

[7]  Hujun Bao,et al.  Robust monocular SLAM in dynamic environments , 2013, 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[8]  Yuxiang Sun,et al.  Improving RGB-D SLAM in dynamic environments: A motion removal approach , 2017, Robotics Auton. Syst..

[9]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[10]  Roland Siegwart,et al.  Maplab: An Open Framework for Research in Visual-Inertial Mapping and Localization , 2017, IEEE Robotics and Automation Letters.

[11]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[13]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[14]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Fabio Tozeto Ramos,et al.  Online self-supervised learning for dynamic object segmentation , 2015, Int. J. Robotics Res..

[16]  Ingmar Posner,et al.  Driven to Distraction: Self-Supervised Distractor Learning for Robust Monocular Visual Odometry in Urban Environments , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Stéphane Donikian,et al.  Experiment-based modeling, simulation and validation of interactions between virtual walkers , 2009, SCA '09.

[18]  Roland Siegwart,et al.  Will It Last? Learning Stable Features for Long-Term Visual Localization , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[19]  Michael Bosse,et al.  Get Out of My Lab: Large-scale, Real-Time Visual-Inertial Localization , 2015, Robotics: Science and Systems.

[20]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[21]  Konrad Schindler,et al.  Predicting Matchability , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Jong-Hwan Kim,et al.  Effective Background Model-Based RGB-D Dense Visual Odometry in a Dynamic Environment , 2016, IEEE Transactions on Robotics.

[23]  Hannes Sommer,et al.  Predicting actions to act predictably: Cooperative partial motion planning with maximum entropy models , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Hannes Sommer,et al.  A Data-driven Model for Interaction-Aware Pedestrian Motion Prediction in Object Cluttered Environments , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[25]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.