Deep-mapnets : A residual network for 3D environment representation

The ability to localize in the co-ordinate system of a 3D model presents an opportunity for safe trajectory planning. While SLAM-based approaches provide estimates of incremental poses with respect to the first camera frame, they do not provide global localization. With the availability of mobile GPUs like the Nvidia TX1 etc., our method provides a novel, elegant and high performance visual method for model based robot localization. We propose a method to learn an environment representation with deep residual nets for localization in a known 3D model representing a real-world area of 25,000 sq. meters. We use the power of modern GPUs and game engines for rendering training images mimicking a downward looking high flying drone using a photorealistic 3D model. We use these images to drive the learning loop of a 50-layer deep neural network to learn camera positions. We next propose to do data augmentation to accelerate training and to make our trained model robust for cross domain generalization, which has been verified with experiments. We test our trained model with synthetically generated data as well as real data captured from a downward looking drone. It takes about 25 miliseconds of GPU processing to predict camera pose. Unlike previous methods, the proposed method does not do rendering at test time and does independent prediction from a learned environment representation.

[1]  Fei Gao,et al.  Real-time monocular dense mapping on aerial robots using visual-inertial fusion , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Kate Saenko,et al.  Return of Frustratingly Easy Domain Adaptation , 2015, AAAI.

[5]  Torsten Sattler,et al.  3D Modeling on the Go: Interactive 3D Reconstruction of Large-Scale Scenes on Mobile Devices , 2015, 2015 International Conference on 3D Vision.

[6]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[7]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Daumé,et al.  Frustratingly Easy Semi-Supervised Domain Adaptation , 2010 .

[9]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[10]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[12]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[13]  Shaojie Shen,et al.  Model-Based Global Localization for Aerial Robots Using Edge Alignment , 2017, IEEE Robotics and Automation Letters.

[14]  Siddhartha S. Srinivasa,et al.  Chisel: Real Time Large Scale 3D Reconstruction Onboard a Mobile Device using Spatially Hashed Signed Distance Fields , 2015, Robotics: Science and Systems.

[15]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[16]  Shahram Izadi,et al.  MonoFusion: Real-time 3D reconstruction of small scenes with a single web camera , 2013, 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[17]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[18]  José Ruíz Ascencio,et al.  Visual simultaneous localization and mapping: a survey , 2012, Artificial Intelligence Review.

[19]  Rama Chellappa,et al.  Visual Domain Adaptation: A survey of recent advances , 2015, IEEE Signal Processing Magazine.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Nicholas Roy,et al.  Simultaneous tracking and rendering: Real-time monocular localization for MAVs , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).