Night-time indoor relocalization using depth image with Convolutional Neural Networks

In this work, we present a Convolutional Neural Network(CNN) with depth images as its inputs to solve the relocalization problem of a moving platform in night-time indoor environment. The developed algorithm can estimate the camera pose in an end-to-end manner with 0.40m and 7.49° errors in real time during night. It does not require any geometric computation as it directly uses a CNN for 6 DOFs pose regression. The architecture and its encoding methods of depth images are discussed. The proposed method is also evaluated on benchmark datasets collected from a motion capture system in our lab.

[1]  Vincent Lepetit,et al.  Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes , 2011, 2011 International Conference on Computer Vision.

[2]  Paul J. Besl,et al.  Method for registration of 3-D shapes , 1992, Other Conferences.

[3]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[5]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[7]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[10]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[11]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[13]  Sven Behnke,et al.  RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[14]  John Sell,et al.  The Xbox One System on a Chip and Kinect Sensor , 2014, IEEE Micro.

[15]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[16]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[17]  Ian D. Reid,et al.  Article in Press Robotics and Autonomous Systems ( ) – Robotics and Autonomous Systems a Comparison of Loop Closing Techniques in Monocular Slam , 2022 .

[18]  Javier González,et al.  Scene structure registration for localization and mapping , 2016, Robotics Auton. Syst..

[19]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[20]  Dongbing Gu,et al.  Extracting Semantic Information from Visual Data: A Survey , 2016, Robotics.

[21]  Hugh F. Durrant-Whyte,et al.  Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[22]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[23]  Wolfram Burgard,et al.  Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Juan D. Tardós,et al.  Fast relocalisation and loop closing in keyframe-based SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[27]  Hugh Durrant-Whyte,et al.  Simultaneous localization and mapping (SLAM): part II , 2006 .

[28]  Wolfram Burgard,et al.  Robust place recognition for 3D range data based on point features , 2010, 2010 IEEE International Conference on Robotics and Automation.

[29]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[30]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[31]  Damir Filko,et al.  Place recognition based on matching of planar surfaces and line segments , 2015, Int. J. Robotics Res..