DUST: Dual Union of Spatio-Temporal Subspaces for Monocular Multiple Object 3D Reconstruction

We present an approach to reconstruct the 3D shape of multiple deforming objects from incomplete 2D trajectories acquired by a single camera. Additionally, we simultaneously provide spatial segmentation (i.e., we identify each of the objects in every frame) and temporal clustering (i.e., we split the sequence into primitive actions). This advances existing work, which only tackled the problem for one single object and non-occluded tracks. In order to handle several objects at a time from partial observations, we model point trajectories as a union of spatial and temporal subspaces, and optimize the parameters of both modalities, the non-observed point tracks and the 3D shape via augmented Lagrange multipliers. The algorithm is fully unsupervised and results in a formulation which does not need initialization. We thoroughly validate the method on challenging scenarios with several human subjects performing different activities which involve complex motions and close interaction. We show our approach achieves state-of-the-art 3D reconstruction results, while it also provides spatial and temporal segmentation.

[1]  Alessio Del Bue,et al.  Factorization for non-rigid and articulated structure using metric projections , 2009, CVPR.

[2]  Zhuwen Li,et al.  Perspective Motion Segmentation via Collaborative Clustering , 2013, 2013 IEEE International Conference on Computer Vision.

[3]  Michal Irani,et al.  Multi-frame optical flow estimation using subspace constraints , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Jean Ponce,et al.  Convex Sparse Matrix Factorizations , 2008, ArXiv.

[5]  Aleix M. Martínez,et al.  Computing Smooth Time Trajectories for Camera and Deformable Shape in Structure from Motion with Occlusion , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Alessio Del Bue,et al.  Joint estimation of segmentation and structure from motion , 2013, Comput. Vis. Image Underst..

[7]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[8]  René Vidal,et al.  Motion Segmentation in the Presence of Outlying, Incomplete, or Corrupted Trajectories , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Lourdes Agapito,et al.  Modal Space: A Physics-Based Model for Sequential Estimation of Time-Varying Shape from Monocular Video , 2016, Journal of Mathematical Imaging and Vision.

[10]  T. Kanade,et al.  A multi-body factorization method for motion analysis , 1995, ICCV 1995.

[11]  Yaser Sheikh,et al.  Separable Spatiotemporal Priors for Convex Reconstruction of Time-Varying 3D Point Clouds , 2014, ECCV.

[12]  René Vidal,et al.  Sparse subspace clustering , 2009, CVPR.

[13]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Francesc Moreno-Noguer,et al.  Learning Shape, Motion and Elastic Models in Force Space , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[16]  Chong-Ho Choi,et al.  Procrustean Normal Distribution for Non-Rigid Structure from Motion , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Hongdong Li,et al.  A simple prior-free method for non-rigid structure-from-motion factorization , 2012, CVPR.

[18]  Yaser Sheikh,et al.  In defense of orthonormality constraints for nonrigid structure from motion , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Jitendra Malik,et al.  Grouping-Based Low-Rank Trajectory Completion and 3D Reconstruction , 2014, NIPS.

[20]  Renato D. C. Monteiro,et al.  Digital Object Identifier (DOI) 10.1007/s10107-004-0564-1 , 2004 .

[21]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[22]  Henning Biermann,et al.  Recovering non-rigid 3D shape from image streams , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[23]  Aaron Hertzmann,et al.  Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Simon Lucey,et al.  General trajectory prior for Non-Rigid reconstruction , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Adrien Bartoli,et al.  Coarse-to-fine low-rank structure-from-motion , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Takeo Kanade,et al.  Nonrigid Structure from Motion in Trajectory Space , 2008, NIPS.

[27]  Songhwai Oh,et al.  Consensus of Non-rigid Reconstructions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Luc Van Gool,et al.  Multibody Structure-from-Motion in Practice , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Francesc Moreno-Noguer,et al.  Global Model with Local Interpretation for Dynamic Shape Reconstruction , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[30]  Simon Lucey,et al.  Complex Non-rigid Motion 3D Reconstruction by Union of Subspaces , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Francesc Moreno-Noguer,et al.  Mode-shape interpretation: Re-thinking modal space for recovering deformable shapes , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[32]  Rui Yu,et al.  Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes , 2014, ECCV.

[33]  Yaser Sheikh,et al.  3D Reconstruction of a Moving Point from a Series of 2D Projections , 2010, ECCV.

[34]  Alexandre Bernardino,et al.  Unifying Nuclear Norm and Bilinear Factorization Approaches for Low-Rank Matrix Decomposition , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[36]  Chong-Ho Choi,et al.  A Procrustean Markov Process for Non-rigid Structure Recovery , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Lourdes Agapito,et al.  Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[39]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Aleix M. Martínez,et al.  Kernel non-rigid structure from motion , 2011, 2011 International Conference on Computer Vision.

[41]  Yuan Gao,et al.  Symmetric Non-rigid Structure from Motion for Category-Specific Object Structure Estimation , 2016, ECCV.