SeDAR - Semantic Detection and Ranging: Humans can Localise without LiDAR, can Robots?

How does a person work out their location using a floorplan? It is probably safe to say that we do not explicitly measure depths to every visible surface and try to match them against different pose estimates in the floorplan. And yet, this is exactly how most robotic scan-matching algorithms operate. Similarly, we do not extrude the 2D geometry present in the floorplan into 3D and try to align it to the real-world. And yet, this is how most vision-based approaches localise. Humans do the exact opposite. Instead of depth, we use high level semantic cues. Instead of extruding the floorplan up into the third dimension, we collapse the 3D world into a 2D representation. Evidence of this is that many of the floorplans we use in everyday life are not accurate, opting instead for high levels of discriminative landmarks. In this work, we use this insight to present a global localisation approach that relies solely on the semantic labels present in the floorplan and extracted from RGB images. While our approach is able to use range measurements if available, we demonstrate that they are unnecessary as we can achieve results comparable to state-of-the-art without them.

[1]  Jana Kosecka,et al.  Adaptive RGB-D Localization , 2012, 2012 Ninth Conference on Computer and Robot Vision.

[2]  Sanja Fidler,et al.  Lost Shopping! Monocular Localization in Large Indoor Spaces , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  François Michaud,et al.  Online global loop closure detection for large-scale multi-session graph-based SLAM , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Michel Dhome,et al.  Model based RGBD SLAM , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[5]  Sanja Fidler,et al.  HouseCraft: Building Houses from Rental Ads and Street Views , 2016, ECCV.

[6]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using unit quaternions , 1987 .

[7]  Wolfram Burgard,et al.  Monte Carlo localization for mobile robots , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[8]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[9]  Andrew Owens,et al.  SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Tomasz Malisiewicz,et al.  RoomNet: End-to-End Room Layout Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Wolfram Burgard,et al.  Using the CONDENSATION algorithm for robust, vision-based mobile robot localization , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[12]  Andreas Geiger,et al.  Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Federico Tombari,et al.  CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[15]  Javier González,et al.  WHAT IS THIS ? , 1995 .

[16]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Wolfram Burgard,et al.  Improved Techniques for Grid Mapping With Rao-Blackwellized Particle Filters , 2007, IEEE Transactions on Robotics.

[19]  Wolfram Burgard,et al.  Monte Carlo Localization: Efficient Position Estimation for Mobile Robots , 1999, AAAI/IAAI.

[20]  John J. Leonard,et al.  Efficient scene simulation for robust monte carlo localization using an RGB-D camera , 2012, 2012 IEEE International Conference on Robotics and Automation.

[21]  Wolfram Burgard,et al.  Accurate indoor localization for RGB-D smartphones and tablets given 2D floor plans , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Sanja Fidler,et al.  Rent3D: Floor-plan priors for monocular layout estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Tsuhan Chen,et al.  You are Here: Mimicking the Human Thinking Process in Reading Floor-Plans , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Uwe D. Hanebeck,et al.  Localization of a mobile robot using relative bearing measurements , 2004, IEEE Transactions on Robotics and Automation.

[25]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[27]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.