Random forests versus Neural Networks — What's best for camera localization?

This work addresses the task of camera localization in a known 3D scene given a single input RGB image. State-of-the-art approaches accomplish this in two steps: firstly, regressing for every pixel in the image its 3D scene coordinate and subsequently, using these coordinates to estimate the final 6D camera pose via RANSAC. To solve the first step. Random Forests (RFs) are typically used. On the other hand. Neural Networks (NNs) reign in many dense regression tasks, but are not test-time efficient. We ask the question: which of the two is best for camera localization? To address this, we make two method contributions: (1) a test-time efficient NN architecture which we term a ForestNet that is derived and initialized from a RF, and (2) a new fully-differentiable robust averaging technique for regression ensembles which can be trained end-to-end with a NN. Our experimental findings show that for scene coordinate regression, traditional NN architectures are superior to test-time efficient RFs and ForestNets, however, this does not translate to final 6D camera pose accuracy where RFs and ForestNets perform slightly better. To summarize, our best method, a ForestNet with a robust average, which has an equivalent fast and lightweight RF, improves over the state-of-the-art for camera localization on the 7-Scenes dataset [1]. While this work focuses on scene coordinate regression for camera localization, our innovations may also be applied to other continuous regression tasks.

[1]  Roberto Cipolla,et al.  Convolutional networks for real-time 6-DOF camera relocalization , 2015, ArXiv.

[2]  Tom Drummond,et al.  Unified Loop Closing and Recovery for Real Time Monocular SLAM , 2008, BMVC.

[3]  Johannes Welbl,et al.  Casting Random Forests as Artificial Neural Networks (and Profiting from It) , 2014, GCPR.

[4]  Ian D. Reid,et al.  Automatic Relocalization and Loop Closing for Real-Time Monocular SLAM , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jian Sun,et al.  Global refinement of random forest , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Horst Bischof,et al.  Alternating Decision Forests , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Peter Kontschieder,et al.  Deep Neural Decision Forests , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Frank Plastria,et al.  On the point for which the sum of the distances to n given points is minimum , 2009, Ann. Oper. Res..

[10]  Walterio W. Mayol-Cuevas,et al.  6D Relocalisation for RGBD Cameras Using Synthetic View Regression , 2012, BMVC.

[11]  Torsten Sattler,et al.  Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[12]  Wayne Ieee,et al.  Entropy Nets: From Decision Trees to Neural Networks , 1990 .

[13]  C. Rother,et al.  Mapping Stacked Decision Forests to Deep and Sparse Convolutional Neural Networks for Semantic Segmentation , 2015 .

[14]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  David J. Fleet,et al.  Efficient Non-greedy Optimization of Decision Trees , 2015, NIPS.

[18]  James J. Little,et al.  Vision-based global localization and mapping for mobile robots , 2005, IEEE Transactions on Robotics.

[19]  Eric Brachmann,et al.  DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jan-Michael Frahm,et al.  From structure-from-motion point clouds to fast location recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Ben Glocker,et al.  Real-Time RGB-D Camera Relocalization via Randomized Ferns for Keyframe Encoding , 2015, IEEE Transactions on Visualization and Computer Graphics.

[24]  Alberto Suárez,et al.  Globally Optimal Fuzzy Decision Trees for Classification and Regression , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  Andrew W. Fitzgibbon,et al.  Exploiting uncertainty in regression forests for accurate camera relocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  David W. Murray,et al.  Improving the Agility of Keyframe-Based SLAM , 2008, ECCV.

[29]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[30]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.