Real-time RGB-D camera relocalization

We introduce an efficient camera relocalization approach which can be easily integrated into real-time 3D reconstruction methods, such as KinectFusion. Our approach makes use of compact encoding of whole image frames which enables both online harvesting of keyframes in tracking mode, and fast retrieval of pose proposals when tracking is lost. The encoding scheme is based on randomized ferns and simple binary feature tests. Each fern generates a small block code, and the concatenation of codes yields a compact representation of each camera frame. Based on those representations we introduce an efficient frame dissimilarity measure which is defined via the block-wise hamming distance (BlockHD). We illustrate how BlockHDs between a query frame and a large set of keyframes can be simultaneously evaluated by traversing the nodes of the ferns and counting image co-occurrences in corresponding code tables. In tracking mode, this mechanism allows us to consider every frame/pose pair as a potential keyframe. A new keyframe is added only if it is sufficiently dissimilar from all previously stored keyframes. For tracking recovery, camera poses are retrieved that correspond to the keyframes with smallest BlockHDs. The pose proposals are then used to reinitialize the tracking algorithm. Harvesting of keyframes and pose retrieval are computationally efficient with only small impact on the run-time performance of the 3D reconstruction. Integrating our relocalization method into KinectFusion allows seamless continuation of mapping even when tracking is frequently lost. Additionally, we demonstrate how marker-free augmented reality, in particular, can benefit from this integration by enabling a smoother and continuous AR experience.

[1]  Vincent Lepetit,et al.  Fast Keypoint Recognition Using Random Ferns , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Ian D. Reid,et al.  Automatic Relocalization and Loop Closing for Real-Time Monocular SLAM , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[4]  Nassir Navab,et al.  Intraoperative Laparoscope Augmentation for Port Placement and Resection Planning in Minimally Invasive Liver Resection , 2008, IEEE Transactions on Medical Imaging.

[5]  Tom Drummond,et al.  Going out: robust model-based tracking for outdoor augmented reality , 2006, 2006 IEEE/ACM International Symposium on Mixed and Augmented Reality.

[6]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  John J. Leonard,et al.  Kintinuous: Spatially Extended KinectFusion , 2012, AAAI 2012.

[8]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[9]  Bernd Schwald,et al.  An Augmented Reality System for Training and Assistence to Maintenance in the Industrial Context , 2003, WSCG.

[10]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Leonidas J. Guibas,et al.  Robust global registration , 2005, SGP '05.

[12]  Slobodan Ilic,et al.  RGB-D camera-based parallel tracking and meshing , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[13]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Marsette Vona,et al.  Moving Volume KinectFusion , 2012, BMVC.

[15]  Daniel Cremers,et al.  Real-time visual odometry from dense RGB-D images , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[16]  Tom Drummond,et al.  Unified Loop Closing and Recovery for Real Time Monocular SLAM , 2008, BMVC.

[17]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[18]  Antonio Criminisi,et al.  Epitomic location recognition , 2008, CVPR.

[19]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[20]  Matthew A. Brown,et al.  Learning Local Image Descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[22]  Tom Drummond,et al.  Faster and Better: A Machine Learning Approach to Corner Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  John J. Leonard,et al.  Robust Tracking for Real-Time Dense RGB-D Mapping with Kintinuous , 2012 .

[24]  David W. Murray,et al.  Improving the Agility of Keyframe-Based SLAM , 2008, ECCV.

[25]  Walterio W. Mayol-Cuevas,et al.  6D Relocalisation for RGBD Cameras Using Synthetic View Regression , 2012, BMVC.

[26]  Dieter Fox,et al.  RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments , 2012, Int. J. Robotics Res..

[27]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.