论文信息 - Deep Weakly Supervised Positioning

Deep Weakly Supervised Positioning

PoseNet can map a photo to the position where it is taken, which is appealing in robotics. However training PoseNet requires full supervision, where ground truth positions are non-trivial to obtain. Can we train PoseNet without knowing the ground truth positions for each observation? We show that this is possible via constraint-based weak-supervision, leading to the proposed framework: DeepGPS. Particularly, using wheel-encoder-estimated distances traveled by a robot along random straight line segments as constraints between PoseNet outputs, DeepGPS can achieve a relative positioning error of less than 2%. Moreover, training DeepGPS can be done as autocalibration with almost no human attendance, which is more attractive than its competing methods that typically require careful and expert-level manual calibration. We conduct various experiments on simulated and real datasets to demonstrate the general applicability, effectiveness, and accuracy of DeepGPS, and perform a comprehensive analysis of its robustness.

Chen Feng | Li Ding | Yang Huang | Xuchu Xu | Ruoyu Wang

[1] John J. Leonard,et al. Towards visual ego-motion learning in robots , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2] Francesc Moreno-Noguer,et al. PL-SLAM: Real-time monocular visual SLAM with points and lines , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[3] Wolfram Burgard,et al. VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry , 2018, IEEE Robotics and Automation Letters.

[4] Roberto Cipolla,et al. PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[6] Sertac Karaman,et al. Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[7] Roland Siegwart,et al. From Coarse to Fine: Robust Hierarchical Localization at Large Scale , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Gaurav Sharma,et al. Fusing structure from motion and lidar for dense accurate depth map estimation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9] Stefano Ermon,et al. Label-Free Supervision of Neural Networks with Physics and Domain Knowledge , 2016, AAAI.

[10] Renaud Dubé,et al. SegMap: 3D Segment Mapping using Data-Driven Descriptors , 2018, Robotics: Science and Systems.

[11] Xiaolin Hu,et al. Delving deeper into convolutional neural networks for camera relocalization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[12] Eric Brachmann,et al. Random forests versus Neural Networks — What's best for camera localization? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[13] J. V. Gemert,et al. On Translation Invariance in CNNs: Convolutional Layers Can Exploit Absolute Spatial Location , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Wolfram Burgard,et al. Deep regression for monocular camera-based 6-DoF global localization in outdoor environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15] Bo Yang,et al. DeepPCO: End-to-End Point Cloud Odometry through Deep Parallel Neural Network , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16] Chen Feng,et al. Point-plane SLAM for hand-held 3D sensors , 2013, 2013 IEEE International Conference on Robotics and Automation.

[17] Torsten Sattler,et al. Semantic Visual Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18] Dieter Schmalstieg,et al. Discriminative Feature-to-Point Matching in Image-Based Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Sen Wang,et al. DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[20] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[21] Andrea Vedaldi,et al. MapNet: An Allocentric Spatial Memory for Mapping Environments , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22] Carol C. Menassa,et al. Marker-Assisted Structure from Motion for 3D Environment Modeling and Object Pose Estimation , 2016 .

[23] Torsten Sattler,et al. Understanding the Limitations of CNN-Based Absolute Camera Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Noah Snavely,et al. Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Roberto Cipolla,et al. Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Jitendra Malik,et al. Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Wolfram Burgard,et al. A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[29] Kin K. Leung,et al. A Survey of Indoor Localization Systems and Technologies , 2017, IEEE Communications Surveys & Tutorials.

[30] Jan Kautz,et al. Geometry-Aware Learning of Maps for Camera Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31] F. Fraundorfer,et al. Visual Odometry : Part II: Matching, Robustness, Optimization, and Applications , 2012, IEEE Robotics & Automation Magazine.

[32] Renaud Dubé,et al. SegMatch: Segment based place recognition in 3D point clouds , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[33] Andrew J. Davison,et al. DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[34] Tom Drummond,et al. EMPNet: Neural Localisation and Mapping Using Embedded Memory Points , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35] Eric Brachmann,et al. Learning Less is More - 6D Camera Localization via 3D Surface Regression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36] Torsten Sattler,et al. InLoc: Indoor Visual Localization with Dense Matching and View Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37] H. Jin Kim,et al. Real-time monocular image-based 6-DoF localization , 2015, Int. J. Robotics Res..

[38] Dongbing Gu,et al. UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[39] Esa Rahtu,et al. Image-Based Localization Using Hourglass Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[40] Gary R. Bradski,et al. ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[41] Edwin Olson,et al. AprilTag: A robust and flexible visual fiducial system , 2011, 2011 IEEE International Conference on Robotics and Automation.

[42] Chen Feng,et al. DeepMapping: Unsupervised Map Estimation From Multiple Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Sen Wang,et al. VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45] Marc Pollefeys,et al. Live Metric 3D Reconstruction on Mobile Phones , 2013, 2013 IEEE International Conference on Computer Vision.

[46] Yuan Yuan,et al. Learning by Inertia: Self-supervised Monocular Visual Odometry for Road Vehicles , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[47] Hongdong Li,et al. Efficient Global 2D-3D Matching for Camera Localization in a Large-Scale 3D Map , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[48] Ramón F. Brena,et al. Evolution of Indoor Positioning Technologies: A Survey , 2017, J. Sensors.

[49] Jitendra Malik,et al. Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).