Monocular Dense 3D Reconstruction of a Complex Dynamic Scene from Two Perspective Frames

This paper proposes a new approach for monocular dense 3D reconstruction of a complex dynamic scene from two perspective frames. By applying superpixel over-segmentation to the image, we model a generically dynamic (hence non-rigid) scene with a piecewise planar and rigid approximation. In this way, we reduce the dynamic reconstruction problem to a “3D jigsaw puzzle ” problem which takes pieces from an unorganized “soup of superpixels". We show that our method provides an effective solution to the inherent relative scale ambiguity in structure-from-motion. Since our method does not assume a template prior, or per-object segmentation, or knowledge about the rigidity of the dynamic scene, it is applicable to a wide range of scenarios. Extensive experiments on both synthetic and real monocular sequences demonstrate the superiority of our method compared with the state-of-the-art methods.

[1]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Andrew Blake,et al.  Surface descriptions from stereo and shading , 1986, Image Vis. Comput..

[3]  Pascal Fua,et al.  A constrained latent variable model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Rui Yu,et al.  Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes , 2014, ECCV.

[5]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Takeo Igarashi,et al.  As-rigid-as-possible shape manipulation , 2005, SIGGRAPH '05.

[7]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Cordelia Schmid,et al.  Learning object class detectors from weakly annotated video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Hongdong Li,et al.  A Simple Prior-Free Method for Non-rigid Structure-from-Motion Factorization , 2012, International Journal of Computer Vision.

[10]  Jitendra Malik,et al.  Grouping-Based Low-Rank Trajectory Completion and 3D Reconstruction , 2014, NIPS.

[11]  Rui Yu,et al.  Direct, Dense, and Deformable: Template-Based Non-rigid 3D Reconstruction from RGB Video , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Takeo Kanade,et al.  Trajectory Space: A Dual Representation for Nonrigid Structure from Motion , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Pascal Fua,et al.  Template-free monocular reconstruction of deformable surfaces , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Didier Stricker,et al.  Flow Fields: Dense Correspondence Fields for Highly Accurate Large Displacement Optical Flow Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[17]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[18]  Robert J. Vanderbei,et al.  Interior-Point Methods for Nonconvex Nonlinear Programming: Filter Methods and Merit Functions , 2002, Comput. Optim. Appl..

[19]  Lourdes Agapito,et al.  Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Michael J. Black,et al.  MoSh: motion and shape capture from sparse markers , 2014, ACM Trans. Graph..

[21]  Fei Wang,et al.  Template-Free 3D Reconstruction of Poorly-Textured Nonrigid Surfaces , 2016, ECCV.

[22]  Marc Pollefeys,et al.  Joint 3D Scene Reconstruction and Class Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Hongdong Li,et al.  Spatio-temporal union of subspaces for multi-body non-rigid structure-from-motion , 2017, Pattern Recognit..

[24]  Vladlen Koltun,et al.  Dense Monocular Depth Estimation in Complex Dynamic Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Ce Liu,et al.  Depth Transfer: Depth Extraction from Video Using Non-Parametric Sampling , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Hongdong Li,et al.  Multi-Body Non-Rigid Structure-from-Motion , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[27]  Alessio Del Bue,et al.  Piecewise Quadratic Reconstruction of Non-Rigid Surfaces from Monocular Sequences , 2010, ECCV.

[28]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[29]  Kiriakos N. Kutulakos,et al.  Non-rigid structure from locally-rigid motion , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.