Predicting world coordinates of pixels in RGB images using Convolutional Neural Network for camera relocalization

Convolutional Neural Networks (CNNs) have achieved great successes in many computer vision tasks and have been applied to pose regression for camera relocalization. Traditional Simultaneously Localization and Mapping (SLAM) approaches use correspondences between camera coordinates and world coordinates to estimate camera pose. In this paper, we present a new camera relocalization method including pixels' world coordinates regression with CNNs and camera pose optimization. We also explore the different characteristics of CNNs and SCoRe Forests on world coordinates regression. Experiments show that our approach has larger camera relocalization error but better performance on predicting world coordinates of pixels compared to SCoRe Forests.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[3]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  David W. Murray,et al.  Improving the Agility of Keyframe-Based SLAM , 2008, ECCV.

[5]  Aly A. Farag,et al.  Neurocalibration: a neural network that can tell camera calibration parameters , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6]  Andrew W. Fitzgibbon,et al.  Multi-output Learning for Camera Relocalization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Dieter Fox,et al.  SE3-nets: Learning rigid body motion using deep neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[8]  David Nistér,et al.  Preemptive RANSAC for live structure and motion estimation , 2005, Machine Vision and Applications.

[9]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Walterio W. Mayol-Cuevas,et al.  6D Relocalisation for RGBD Cameras Using Synthetic View Regression , 2012, BMVC.

[12]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[13]  Matthias Nießner,et al.  Learning to Navigate the Energy Landscape , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[14]  Jiri Matas,et al.  Locally Optimized RANSAC , 2003, DAGM-Symposium.

[15]  Andrew W. Fitzgibbon,et al.  Exploiting uncertainty in regression forests for accurate camera relocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Marc Pollefeys,et al.  Leveraging 3D City Models for Rotation Invariant Place-of-Interest Recognition , 2011, International Journal of Computer Vision.

[17]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[18]  Torsten Sattler,et al.  Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[19]  Niko Sünderhauf,et al.  On the performance of ConvNet features for place recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Hujun Bao,et al.  Keyframe-based real-time camera tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Qingjie Zhao,et al.  Using Neural Network Technique in Vision-based Robot Curve Tracking , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.