Efficient keyframe-based real-time camera tracking

We present a novel keyframe-based global localization method for markerless real-time camera tracking. Our system contains an offline module to select features from a group of reference images and an online module to match them to the input live video for quickly estimating the camera pose. The main contribution lies in constructing an optimal set of keyframes from the input reference images, which are required to approximately cover the entire space and at the same time to minimize the content redundancy among the selected frames. This strategy not only greatly saves computation, but also helps significantly reduce the number of repeated features. For a large-scale scene, it requires a significant effort to capture sufficient reference images and reconstruct the 3D environment. In order to alleviate the effort of offline preprocessing and enhance the tracking ability in a larger scale scene, we also propose an online reference map extension module, which can real-time reconstruct new 3D features and select online keyframes to extend the keyframe set. In addition, we develop a parallel-computing framework that employs both GPUs and multi-threading for speedup. Experimental results show that our method dramatically enhances the computing efficiency and eliminates the jittering artifacts in real-time camera tracking.

[1]  Reinhard Koch,et al.  Visual Modeling with a Hand-Held Camera , 2004, International Journal of Computer Vision.

[2]  Daniel P. Huttenlocher,et al.  Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[3]  Xing Xie,et al.  Vocabulary tree incremental indexing for scalable location recognition , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[4]  Ian D. Reid,et al.  Real-Time SLAM Relocalisation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[6]  Tobias Höllerer,et al.  Hybrid Feature Tracking and User Interaction for Markerless Augmented Reality , 2008, 2008 IEEE Virtual Reality Conference.

[7]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[8]  Andrew J. Davison,et al.  Lightweight SLAM and Navigation with a Multi-Camera Rig , 2011, ECMR.

[9]  David G. Lowe,et al.  Scene modelling, recognition and tracking with invariant image features , 2004, Third IEEE and ACM International Symposium on Mixed and Augmented Reality.

[10]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[11]  Vincent Lepetit,et al.  Monocular Model-Based 3D Tracking of Rigid Objects: A Survey , 2005, Found. Trends Comput. Graph. Vis..

[12]  Hujun Bao,et al.  Keyframe-based real-time camera tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[15]  Andrew J. Davison,et al.  Real-time simultaneous localisation and mapping with a single camera , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[17]  Jean-Arcady Meyer,et al.  Fast and Incremental Method for Loop-Closure Detection Using Bags of Visual Words , 2008, IEEE Transactions on Robotics.

[18]  Changchang Wu,et al.  SiftGPU : A GPU Implementation of Scale Invariant Feature Transform (SIFT) , 2007 .

[19]  David W. Murray,et al.  Video-rate localization in multiple maps for wearable augmented reality , 2008, 2008 12th IEEE International Symposium on Wearable Computers.

[20]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Hujun Bao,et al.  Robust Metric Reconstruction from Challenging Video Sequences , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  David W. Murray,et al.  Improving the Agility of Keyframe-Based SLAM , 2008, ECCV.

[23]  Tom Drummond,et al.  Scalable Monocular SLAM , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Juan D. Tardós,et al.  Hierarchical SLAM: real-time accurate mapping of large environments , 2005, IEEE Transactions on Robotics.

[25]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[26]  Walterio W. Mayol-Cuevas,et al.  Appearance Based Indexing for Relocalisation in Real-Time Visual SLAM , 2008, BMVC.

[27]  Raimondo Schettini,et al.  Erratum to: An innovative algorithm for key frame extraction in video summarization , 2006, Journal of Real-Time Image Processing.

[28]  Vincent Lepetit,et al.  Combining edge and texture information for real-time accurate 3D camera tracking , 2004, Third IEEE and ACM International Symposium on Mixed and Augmented Reality.

[29]  Ian D. Reid,et al.  Mapping Large Loops with a Single Hand-Held Camera , 2007, Robotics: Science and Systems.

[30]  Kurt Konolige,et al.  Double window optimisation for constant time visual SLAM , 2011, 2011 International Conference on Computer Vision.

[31]  Daniel Cremers,et al.  Real-Time Dense Geometry from a Handheld Camera , 2010, DAGM-Symposium.

[32]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[33]  John R. Kender,et al.  Optimization Algorithms for the Selection of Key Frame Sequences of Variable Length , 2002, ECCV.

[34]  FuaPascal,et al.  Monocular model-based 3D tracking of rigid objects , 2005 .

[35]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[36]  Tom Drummond,et al.  Unified Loop Closing and Recovery for Real Time Monocular SLAM , 2008, BMVC.

[37]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[39]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[40]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[41]  Éric Marchand,et al.  Real-time markerless tracking for augmented reality: the virtual visual servoing framework , 2006, IEEE Transactions on Visualization and Computer Graphics.

[42]  Andrew J. Davison,et al.  Live dense reconstruction with a single moving camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43]  Jan-Michael Frahm,et al.  From structure-from-motion point clouds to fast location recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Walterio W. Mayol-Cuevas,et al.  Robust Real-Time Visual SLAM Using Scale Prediction and Exemplar Based Feature Description , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Vincent Lepetit,et al.  Multiple 3D Object tracking for augmented reality , 2008, 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality.

[46]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[48]  Andrew J. Davison,et al.  Live Feature Clustering in Video Using Appearance and 3D Geometry , 2010, BMVC.

[49]  Ian D. Reid,et al.  An image-to-map loop closing method for monocular SLAM , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[50]  Andrew Zisserman,et al.  Automated location matching in movies , 2003, Comput. Vis. Image Underst..

[51]  Andrew J. Davison,et al.  Automatically and efficiently inferring the hierarchical structure of visual maps , 2009, 2009 IEEE International Conference on Robotics and Automation.

[52]  Steve Bourgeois,et al.  Augmented reality in large environments: Application to aided navigation in urban context , 2010, 2010 IEEE International Symposium on Mixed and Augmented Reality.

[53]  Paul Newman,et al.  Accelerated appearance-only SLAM , 2008, 2008 IEEE International Conference on Robotics and Automation.