Deep Weakly Supervised Positioning

PoseNet can map a photo to the position where it is taken, which is appealing in robotics. However training PoseNet requires full supervision, where ground truth positions are non-trivial to obtain. Can we train PoseNet without knowing the ground truth positions for each observation? We show that this is possible via constraint-based weak-supervision, leading to the proposed framework: DeepGPS. Particularly, using wheel-encoder-estimated distances traveled by a robot along random straight line segments as constraints between PoseNet outputs, DeepGPS can achieve a relative positioning error of less than 2%. Moreover, training DeepGPS can be done as autocalibration with almost no human attendance, which is more attractive than its competing methods that typically require careful and expert-level manual calibration. We conduct various experiments on simulated and real datasets to demonstrate the general applicability, effectiveness, and accuracy of DeepGPS, and perform a comprehensive analysis of its robustness.

[1]  John J. Leonard,et al.  Towards visual ego-motion learning in robots , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Francesc Moreno-Noguer,et al.  PL-SLAM: Real-time monocular visual SLAM with points and lines , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Wolfram Burgard,et al.  VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry , 2018, IEEE Robotics and Automation Letters.

[4]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[6]  Sertac Karaman,et al.  Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[7]  Roland Siegwart,et al.  From Coarse to Fine: Robust Hierarchical Localization at Large Scale , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Gaurav Sharma,et al.  Fusing structure from motion and lidar for dense accurate depth map estimation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Stefano Ermon,et al.  Label-Free Supervision of Neural Networks with Physics and Domain Knowledge , 2016, AAAI.

[10]  Renaud Dubé,et al.  SegMap: 3D Segment Mapping using Data-Driven Descriptors , 2018, Robotics: Science and Systems.

[11]  Xiaolin Hu,et al.  Delving deeper into convolutional neural networks for camera relocalization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Eric Brachmann,et al.  Random forests versus Neural Networks — What's best for camera localization? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[13]  J. V. Gemert,et al.  On Translation Invariance in CNNs: Convolutional Layers Can Exploit Absolute Spatial Location , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Wolfram Burgard,et al.  Deep regression for monocular camera-based 6-DoF global localization in outdoor environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Bo Yang,et al.  DeepPCO: End-to-End Point Cloud Odometry through Deep Parallel Neural Network , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Chen Feng,et al.  Point-plane SLAM for hand-held 3D sensors , 2013, 2013 IEEE International Conference on Robotics and Automation.

[17]  Torsten Sattler,et al.  Semantic Visual Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Dieter Schmalstieg,et al.  Discriminative Feature-to-Point Matching in Image-Based Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Sen Wang,et al.  DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[20]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[21]  Andrea Vedaldi,et al.  MapNet: An Allocentric Spatial Memory for Mapping Environments , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Carol C. Menassa,et al.  Marker-Assisted Structure from Motion for 3D Environment Modeling and Object Pose Estimation , 2016 .

[23]  Torsten Sattler,et al.  Understanding the Limitations of CNN-Based Absolute Camera Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jitendra Malik,et al.  Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[29]  Kin K. Leung,et al.  A Survey of Indoor Localization Systems and Technologies , 2017, IEEE Communications Surveys & Tutorials.

[30]  Jan Kautz,et al.  Geometry-Aware Learning of Maps for Camera Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  F. Fraundorfer,et al.  Visual Odometry : Part II: Matching, Robustness, Optimization, and Applications , 2012, IEEE Robotics & Automation Magazine.

[32]  Renaud Dubé,et al.  SegMatch: Segment based place recognition in 3D point clouds , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[34]  Tom Drummond,et al.  EMPNet: Neural Localisation and Mapping Using Embedded Memory Points , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Eric Brachmann,et al.  Learning Less is More - 6D Camera Localization via 3D Surface Regression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Torsten Sattler,et al.  InLoc: Indoor Visual Localization with Dense Matching and View Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  H. Jin Kim,et al.  Real-time monocular image-based 6-DoF localization , 2015, Int. J. Robotics Res..

[38]  Dongbing Gu,et al.  UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Esa Rahtu,et al.  Image-Based Localization Using Hourglass Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[40]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[41]  Edwin Olson,et al.  AprilTag: A robust and flexible visual fiducial system , 2011, 2011 IEEE International Conference on Robotics and Automation.

[42]  Chen Feng,et al.  DeepMapping: Unsupervised Map Estimation From Multiple Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Sen Wang,et al.  VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Marc Pollefeys,et al.  Live Metric 3D Reconstruction on Mobile Phones , 2013, 2013 IEEE International Conference on Computer Vision.

[46]  Yuan Yuan,et al.  Learning by Inertia: Self-supervised Monocular Visual Odometry for Road Vehicles , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[47]  Hongdong Li,et al.  Efficient Global 2D-3D Matching for Camera Localization in a Large-Scale 3D Map , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[48]  Ramón F. Brena,et al.  Evolution of Indoor Positioning Technologies: A Survey , 2017, J. Sensors.

[49]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).