Dense multibody motion estimation and reconstruction from a handheld camera

Existing approaches to camera tracking and reconstruction from a single handheld camera for Augmented Reality (AR) focus on the reconstruction of static scenes. However, most real world scenarios are dynamic and contain multiple independently moving rigid objects. This paper addresses the problem of simultaneous segmentation, motion estimation and dense 3D reconstruction of dynamic scenes. We propose a dense solution to all three elements of this problem: depth estimation, motion label assignment and rigid transformation estimation directly from the raw video by optimizing a single cost function using a hill-climbing approach. We do not require prior knowledge of the number of objects present in the scene - the number of independent motion models and their parameters are automatically estimated. The resulting inference method combines the best techniques in discrete and continuous optimization: a state of the art variational approach is used to estimate the dense depth maps while the motion segmentation is achieved using discrete graph-cut based optimization. For the rigid motion estimation of the independently moving objects we propose a novel tracking approach designed to cope with the small fields of view they induce and agile motion. Our experimental results on real sequences show how accurate segmentations and dense depth maps can be obtained in a completely automated way and used in marker-free AR applications.

[1]  D. Cremers Convex Relaxation Techniques for Segmentation , Stereo and Multiview Reconstruction , 2010 .

[2]  Andrew W. Fitzgibbon,et al.  Multibody Structure and Motion: 3-D Reconstruction of Independently Moving Objects , 2000, ECCV.

[3]  Jean-Yves Guillemaut,et al.  Joint Multi-Layer Segmentation and Reconstruction for Free-Viewpoint Video Applications , 2011, International Journal of Computer Vision.

[4]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[5]  Pushmeet Kohli,et al.  Surface stereo with soft segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Pushmeet Kohli,et al.  Dynamic Hybrid Algorithms for MAP Inference in Discrete MRFs , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Anton Osokin,et al.  Fast Approximate Energy Minimization with Label Costs , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[9]  C. Zach Fast and High Quality Fusion of Depth Maps , 2008 .

[10]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[11]  Luc Van Gool,et al.  Multibody Structure-from-Motion in Practice , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Hujun Bao,et al.  Simultaneous multi-body stereo and segmentation , 2011, 2011 International Conference on Computer Vision.

[13]  Daniel Cremers,et al.  Anisotropic Huber-L1 Optical Flow , 2009, BMVC.

[14]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[15]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Lourdes Agapito,et al.  Automated articulated structure and 3D shape recovery from point correspondences , 2011, 2011 International Conference on Computer Vision.

[17]  T. Kanade,et al.  A multi-body factorization method for motion analysis , 1995, ICCV 1995.

[18]  René Vidal,et al.  A Benchmark for the Comparison of 3-D Motion Segmentation Algorithms , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  K. Madhava Krishna,et al.  Realtime multibody visual SLAM with a smoothly moving monocular camera , 2011, 2011 International Conference on Computer Vision.

[21]  Jan-Michael Frahm,et al.  Detailed Real-Time Urban 3D Reconstruction from Video , 2007, International Journal of Computer Vision.

[22]  S. Shankar Sastry,et al.  Generalized principal component analysis (GPCA) , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Lourdes Agapito,et al.  Robust Trajectory-Space TV-L1 Optical Flow for Non-rigid Sequences , 2011, EMMCVPR.

[24]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[25]  David Suter,et al.  A Model-Selection Framework for Multibody Structure-and-Motion of Image Sequences , 2007, International Journal of Computer Vision.

[26]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[27]  Gerhard Reitmayr,et al.  Homography-based planar mapping and tracking for mobile phones , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[28]  Pushmeet Kohli,et al.  Graph Cut Based Inference with Co-occurrence Statistics , 2010, ECCV.

[29]  Yuandong Tian,et al.  A globally optimal data-driven approach for image distortion estimation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[31]  Tom Drummond,et al.  Rapid scene reconstruction on mobile phones from panoramic images , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[32]  Jing Yuan,et al.  TV-Based Multi-Label Image Segmentation with Label Cost Prior , 2010, BMVC.

[33]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[34]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[35]  Hujun Bao,et al.  Robust Metric Reconstruction from Challenging Video Sequences , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Vladimir Kolmogorov,et al.  Comparison of Energy Minimization Algorithms for Highly Connected Graphs , 2006, ECCV.

[38]  René Vidal,et al.  Motion Segmentation in the Presence of Outlying, Incomplete, or Corrupted Trajectories , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Yuri Boykov,et al.  Energy-Based Geometric Multi-model Fitting , 2012, International Journal of Computer Vision.

[40]  Tom Drummond,et al.  Interactive model reconstruction with user guidance , 2009, 2009 8th IEEE International Symposium on Mixed and Augmented Reality.

[41]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[42]  Daniel Cremers,et al.  Large displacement optical flow computation withoutwarping , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[43]  Daniel Cremers,et al.  Real-Time Dense Geometry from a Handheld Camera , 2010, DAGM-Symposium.

[44]  Dieter Schmalstieg,et al.  Pose tracking from natural features on mobile phones , 2008, 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality.

[45]  Daniel Cremers,et al.  Real-time visual odometry from dense RGB-D images , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).