Real-Time Camera Tracking: When is High Frame-Rate Best?

Higher frame-rates promise better tracking of rapid motion, but advanced real-time vision systems rarely exceed the standard 10–60Hz range, arguing that the computation required would be too great. Actually, increasing frame-rate is mitigated by reduced computational cost per frame in trackers which take advantage of prediction. Additionally, when we consider the physics of image formation, high frame-rate implies that the upper bound on shutter time is reduced, leading to less motion blur but more noise. So, putting these factors together, how are application-dependent performance requirements of accuracy, robustness and computational cost optimised as frame-rate varies? Using 3D camera tracking as our test problem, and analysing a fundamental dense whole image alignment approach, we open up a route to a systematic investigation via the careful synthesis of photorealistic video using ray-tracing of a detailed 3D scene, experimentally obtained photometric response and noise models, and rapid camera motions. Our multi-frame-rate, multi-resolution, multi-light-level dataset is based on tens of thousands of hours of CPU rendering time. Our experiments lead to quantitative conclusions about frame-rate selection and highlight the crucial role of full consideration of physical image formation in pushing tracking performance.

[1]  Richard Szeliski,et al.  Noise Estimation from a Single Image , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  Tobias Pietzsch,et al.  A Framework For Evaluating Visual SLAM , 2009, BMVC.

[3]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[4]  Vincent Lepetit,et al.  Compact signatures for high-speed interest point description and matching , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Frédo Durand,et al.  Noise-optimal capture for high dynamic range photography , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[7]  Kiriakos N. Kutulakos,et al.  Time-constrained photography , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Jitendra Malik,et al.  Recovering high dynamic range radiance maps from photographs , 1997, SIGGRAPH '08.

[9]  Patrick Rives,et al.  Accurate Quadrifocal Tracking for Robust 3D Visual Odometry , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[10]  Wolfram Burgard,et al.  Towards a benchmark for RGB-D SLAM evaluation , 2011, RSS 2011.

[11]  Masatoshi Ishikawa,et al.  1 ms column parallel vision system and its application of high speed target tracking , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[12]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[13]  Patrick Rives,et al.  A spherical robot-centered representation for urban navigation , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Geoffrey E. Hinton,et al.  Learning Generative Texture Models with extended Fields-of-Experts , 2009, BMVC.

[15]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[16]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[17]  David W. Murray,et al.  Simulating Low-Cost Cameras for Augmented Reality Compositing , 2010, IEEE Transactions on Visualization and Computer Graphics.

[18]  Vincent Lepetit,et al.  Pareto-optimal dictionaries for signatures , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.