Accurate Single Image Multi-modal Camera Pose Estimation

A well known problem in photogrammetry and computer vision is the precise and robust determination of camera poses with respect to a given 3D model. In this work we propose a novel multi-modal method for single image camera pose estimation with respect to 3D models with intensity information (e.g., LiDAR data with reflectance information). We utilize a direct point based rendering approach to generate synthetic 2D views from 3D datasets in order to bridge the dimensionality gap. The proposed method then establishes 2D/2D point and local region correspondences based on a novel self-similarity distance measure. Correct correspondences are robustly identified by searching for small regions with a similar geometric relationship of local self-similarities using a Generalized Hough Transform. After backprojection of the generated features into 3D a standard Perspective-n-Points problem is solved to yield an initial camera pose. The pose is then accurately refined using an intensity based 2D/3D registration approach. An evaluation on Vis/IR 2D and airborne and terrestrial 3D datasets shows that the proposed method is applicable to a wide range of different sensor types. In addition, the approach outperforms standard global multi-modal 2D/3D registration approaches based on Mutual Information with respect to robustness and speed. Potential applications are widespread and include for instance multi-spectral texturing of 3D models, SLAM applications, sensor data fusion and multi-spectral camera calibration and super-resolution applications.

[1]  Changchang Wu,et al.  SiftGPU : A GPU Implementation of Scale Invariant Feature Transform (SIFT) , 2007 .

[2]  Lu Wang,et al.  A robust approach for automatic registration of aerial images with untextured aerial LiDAR data , 2009, CVPR.

[3]  Gregory D. Hager,et al.  Fast and Globally Convergent Pose Estimation from Video Images , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Larry S. Davis,et al.  Model-based object pose in 25 lines of code , 1992, International Journal of Computer Vision.

[5]  Alexandru Vasile,et al.  Automatic Alignment of Color Imagery onto 3D Laser Radar Data , 2006, 35th IEEE Applied Imagery and Pattern Recognition Workshop (AIPR'06).

[6]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[9]  Paul A. Viola,et al.  Alignment by Maximization of Mutual Information , 1997, International Journal of Computer Vision.

[10]  Jan-Michael Frahm,et al.  A Comparative Analysis of RANSAC Techniques Leading to Adaptive Real-Time Random Sample Consensus , 2008, ECCV.

[11]  William E. Lorensen,et al.  The visualization toolkit (2nd ed.): an object-oriented approach to 3D graphics , 1998 .

[12]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[13]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[14]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[15]  John W. Fisher,et al.  Automatic registration of LIDAR and optical images of urban scenes , 2009, CVPR.

[16]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[17]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[18]  William Schroeder,et al.  The Visualization Toolkit: An Object-Oriented Approach to 3-D Graphics , 1997 .

[19]  George Vosselman,et al.  Airborne and terrestrial laser scanning , 2011, Int. J. Digit. Earth.

[20]  Jürgen Weese,et al.  A comparison of similarity measures for use in 2-D-3-D medical image registration , 1998, IEEE Transactions on Medical Imaging.

[21]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[22]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[23]  Markus Gross,et al.  Point-Based Graphics , 2007 .

[24]  Selim Benhimane,et al.  Homography-based 2D Visual Tracking and Servoing , 2007, Int. J. Robotics Res..

[25]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Philip David,et al.  SoftPOSIT: Simultaneous Pose and Correspondence Determination , 2002, ECCV.

[27]  W. Wagner,et al.  Gaussian decomposition and calibration of a novel small-footprint full-waveform digitising airborne laser scanner , 2006 .

[28]  Avideh Zakhor,et al.  Automatic registration of aerial imagery with untextured 3D LiDAR models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.