Deep Non-Rigid Structure From Motion With Missing Data

Current NRSfM algorithms are limited from two perspectives: (i) the number of images, and (ii) the type of shape variability they can handle. In this paper we propose a novel hierarchical sparse coding model for NRSFM which can overcome (i) and (ii) to such an extent, that NRSFM can be applied to problems in vision previously thought too ill posed. Our approach is realized in practice as the training of an unsupervised deep neural network (DNN) auto-encoder with a unique architecture that is able to disentangle pose from 3D structure. Using modern deep learning computational platforms allows us to solve NRSfM problems at an unprecedented scale and shape complexity. Our approach has no 3D supervision, relying solely on 2D point correspondences. Further, our approach is also able to handle missing/occluded 2D points without the need for matrix completion. Extensive experiments demonstrate the impressive performance of our approach where we exhibit superior precision and robustness against all available state-of-the-art works in some instances by an order of magnitude. We further propose a new quality measure (based on the network weights) which circumvents the need for 3D ground-truth to ascertain the confidence we have in the reconstructability.

[1]  Chen Kong,et al.  Prior-Less Compressible Structure from Motion , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Leonidas J. Guibas,et al.  Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Takeo Kanade,et al.  Trajectory Space: A Dual Representation for Nonrigid Structure from Motion , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Xiaowei Zhou,et al.  Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Aleix M. Martínez,et al.  Kernel non-rigid structure from motion , 2011, 2011 International Conference on Computer Vision.

[6]  Marc Teboulle,et al.  A fast Iterative Shrinkage-Thresholding Algorithm with application to wavelet-based image deblurring , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[8]  Richard G. Baraniuk,et al.  Sparse Coding via Thresholding and Local Competition in Neural Circuits , 2008, Neural Computation.

[9]  Lourdes Agapito,et al.  Reconstructing PASCAL VOC , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Takeo Kanade,et al.  Nonrigid Structure from Motion in Trajectory Space , 2008, NIPS.

[11]  Francesc Moreno-Noguer,et al.  Image Collection Pop-up: 3D Reconstruction and Clustering of Rigid and Non-rigid Categories , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Simon Lucey,et al.  Deep Interpretable Non-Rigid Structure from Motion , 2019, ArXiv.

[13]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[14]  Jing Xiao,et al.  A Closed-Form Solution to Non-rigid Shape and Motion Recovery , 2004, ECCV.

[15]  Yuandong Tian,et al.  Single Image 3D Interpreter Network , 2016, ECCV.

[16]  Michael Elad,et al.  Convolutional Neural Networks Analyzed via Convolutional Sparse Coding , 2016, J. Mach. Learn. Res..

[17]  Chen Kong,et al.  Structure from Category: A Generic and Prior-Less Approach , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[18]  Hongdong Li,et al.  A simple prior-free method for non-rigid structure-from-motion factorization , 2012, CVPR.

[19]  Simon Lucey,et al.  Complex Non-rigid Motion 3D Reconstruction by Union of Subspaces , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Songhwai Oh,et al.  Consensus of Non-rigid Reconstructions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ambrish Tyagi,et al.  Can 3D Pose be Learned from 2D Projections Alone? , 2018, ECCV Workshops.

[22]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[23]  Antonio Torralba,et al.  Parsing IKEA Objects: Fine Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Aleix M. Martínez,et al.  Computing Smooth Time Trajectories for Camera and Deformable Shape in Structure from Motion with Occlusion , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Jitendra Malik,et al.  Grouping-Based Low-Rank Trajectory Completion and 3D Reconstruction , 2014, NIPS.

[26]  Francesc Moreno-Noguer,et al.  DUST: Dual Union of Spatio-Temporal Subspaces for Monocular Multiple Object 3D Reconstruction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Hongdong Li,et al.  A Simple Prior-Free Method for Non-rigid Structure-from-Motion Factorization , 2012, International Journal of Computer Vision.

[28]  Alessio Del Bue,et al.  Non-rigid structure from motion using ranklet-based tracking and non-linear optimization , 2007, Image Vis. Comput..

[29]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[30]  Yaser Sheikh,et al.  In defense of orthonormality constraints for nonrigid structure from motion , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Henning Biermann,et al.  Recovering non-rigid 3D shape from image streams , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[33]  Alessio Del Bue,et al.  A Benchmark and Evaluation of Non-Rigid Structure from Motion , 2018, International Journal of Computer Vision.

[34]  Aleix M. Martínez,et al.  Learning Spatially-Smooth Mappings in Non-Rigid Structure From Motion , 2012, ECCV.