Video Registration to SfM Models

Registering image data to Structure from Motion (SfM) point clouds is widely used to find precise camera location and orientation with respect to a world model. In case of videos one constraint has previously been unexploited: temporal smoothness. Without temporal smoothness the magnitude of the pose error in each frame of a video will often dominate the magnitude of frame-to-frame pose change. This hinders application of methods requiring stable poses estimates (e.g. tracking, augmented reality). We incorporate temporal constraints into the image-based registration setting and solve the problem by pose regularization with model fitting and smoothing methods. This leads to accurate, gap-free and smooth poses for all frames. We evaluate different methods on challenging synthetic and real street-view SfM data for varying scenarios of motion speed, outlier contamination, pose estimation failures and 2D-3D correspondence noise. For all test cases a 2 to 60-fold reduction in root mean squared (RMS) positional error is observed, depending on pose estimation difficulty. For varying scenarios, different methods perform best. We give guidance which methods should be preferred depending on circumstances and requirements.

[1]  Tomás Pajdla,et al.  Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[2]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[3]  Seth J. Teller,et al.  Wide-Area Egomotion Estimation from Known 3D Structure , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[5]  Shiqi Li,et al.  A Robust O(n) Solution to the Perspective-n-Point Problem , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Feng Wu,et al.  Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[9]  Luc Van Gool,et al.  Sparse Quantization for Patch Description , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Yannis Avrithis,et al.  VIRaL: Visual Image Retrieval and Localization , 2010, Multimedia Tools and Applications.

[11]  Daniel P. Huttenlocher,et al.  Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[12]  Pascal Fua,et al.  Worldwide Pose Estimation Using 3D Point Clouds , 2012, ECCV.

[13]  Noah Snavely,et al.  Graph-Based Discriminative Learning for Location Recognition , 2013, International Journal of Computer Vision.

[14]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[15]  Roberto Cipolla,et al.  An Image-Based System for Urban Navigation , 2004, BMVC.

[16]  Peter F. Sturm,et al.  Pose estimation using both points and lines for geo-localization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[17]  Andreas Geiger,et al.  Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  David Martin,et al.  Street View Motion-from-Structure-from-Motion , 2013, 2013 IEEE International Conference on Computer Vision.

[19]  James J. Little,et al.  Vision-based mobile robot localization and mapping using scale-invariant features , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[20]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[21]  Amir Roshan Zamir,et al.  City scale geo-spatial trajectory estimation of a moving camera , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  David G. Lowe,et al.  What and Where: 3D Object Recognition with Accurate Pose , 2006, Toward Category-Level Object Recognition.

[23]  Torsten Sattler,et al.  Improving Image-Based Localization by Active Correspondence Search , 2012, ECCV.

[24]  Michael F. Cohen,et al.  Real-time image-based 6-DOF localization in large-scale environments , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Motilal Agrawal,et al.  A Lie Algebraic Approach for Consistent Pose Registration for General Euclidean Motion , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[27]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Torsten Sattler,et al.  Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[30]  Lorenzo Torresani,et al.  Leveraging Structure from Motion to Learn Discriminative Codebooks for Scalable Landmark Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Marc Pollefeys,et al.  Live Metric 3D Reconstruction on Mobile Phones , 2013, 2013 IEEE International Conference on Computer Vision.

[32]  Alexei A. Efros,et al.  Image sequence geolocation with human travel priors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[33]  Feng Wu,et al.  3D visual phrases for landmark recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[35]  Andrew J. Davison,et al.  Live dense reconstruction with a single moving camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Jan-Michael Frahm,et al.  From structure-from-motion point clouds to fast location recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Yubin Kuang,et al.  Revisiting the PnP Problem: A Fast, General and Optimal Solution , 2013, 2013 IEEE International Conference on Computer Vision.

[38]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Mathieu Aubry,et al.  Painting-to-3D model alignment via discriminative visual elements , 2014, TOGS.

[40]  Jake K. Aggarwal,et al.  Matching Aerial Images to 3-D Terrain Maps , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Supun Samarasekera,et al.  Pose estimation, model refinement, and enhanced visualization using video , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[42]  Wenyi Zhao,et al.  Alignment of continuous video onto 3D point clouds , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Rita Cucchiara,et al.  Estimating Geospatial Trajectory of a Moving Camera , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[44]  Masatoshi Okutomi,et al.  ASPnP: An Accurate and Scalable Solution to the Perspective-n-Point Problem , 2013, IEICE Trans. Inf. Syst..