Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints

We introduce the first dense neural non-rigid structure from motion (N-NRSfM) approach, which can be trained end-to-end in an unsupervised manner from 2D point tracks. Compared to the competing methods, our combination of loss functions is fully-differentiable and can be readily integrated into deep-learning systems. We formulate the deformation model by an auto-decoder and impose subspace constraints on the recovered latent space function in a frequency domain. Thanks to the state recurrence cue, we classify the reconstructed non-rigid surfaces based on their similarity and recover the period of the input sequence. Our N-NRSfM approach achieves competitive accuracy on widely-used benchmark sequences and high visual quality on various real videos. Apart from being a standalone technique, our method enables multiple applications including shape compression, completion and interpolation, among others. Combined with an encoder trained directly on 2D images, we perform scenario-specific monocular 3D shape reconstruction at interactive frame rates. To facilitate the reproducibility of the results and boost the new research direction, we open-source our code and provide trained models for research purposes.

[1]  Didier Stricker,et al.  Accurate 3D Reconstruction of Dynamic Scenes from Monocular Image Sequences with Severe Occlusions , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[2]  Francesc Moreno-Noguer,et al.  A scalable, efficient, and accurate solution to non-rigid structure from motion , 2018, Comput. Vis. Image Underst..

[3]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[4]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[5]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Chong-Ho Choi,et al.  Procrustean Normal Distribution for Non-Rigid Structure from Motion , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Takeo Kanade,et al.  Trajectory Space: A Dual Representation for Nonrigid Structure from Motion , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Mathieu Aubry,et al.  A Papier-Mache Approach to Learning 3D Surface Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Hans-Peter Seidel,et al.  FML: Face Model Learning From Videos , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jing Xiao,et al.  A Closed-Form Solution to Non-Rigid Shape and Motion Recovery , 2004, International Journal of Computer Vision.

[12]  Didier Stricker,et al.  NRSfM-Flow: Recovering Non-Rigid Scene Flow from Monocular Image Sequences , 2016, BMVC.

[13]  Lourdes Agapito,et al.  Soft Inextensibility Constraints for Template-Free Non-rigid Reconstruction , 2012, ECCV.

[14]  Francesc Moreno-Noguer,et al.  Global Model with Local Interpretation for Dynamic Shape Reconstruction , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[15]  Mathieu Aubry,et al.  AtlasNet: A Papier-M\^ach\'e Approach to Learning 3D Surface Generation , 2018, CVPR 2018.

[16]  Alessio Del Bue,et al.  A factorization approach to structure from motion with shape priors , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Francesc Moreno-Noguer,et al.  Force-Based Representation for Non-Rigid Shape and Elastic Model Estimation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Hans-Peter Seidel,et al.  Lightweight binocular facial performance capture under uncontrolled lighting , 2012, ACM Trans. Graph..

[20]  Alessio Del Bue,et al.  Optimal Metric Projections for Deformable and Articulated Structure-from-Motion , 2011, International Journal of Computer Vision.

[21]  Antonis A. Argyros,et al.  Patch-Based Reconstruction of a Textureless Deformable 3D Surface from a Single RGB Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[22]  Didier Stricker,et al.  Intrinsic Dynamic Shape Prior for Fast, Sequential and Dense Non-Rigid Structure from Motion with Detection of Temporally-Disjoint Rigidity , 2019, ArXiv.

[23]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Didier Stricker,et al.  HDM-Net: Monocular Non-Rigid 3D Reconstruction with Learned Deformation Model , 2018, EuroVR.

[25]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[26]  Vincent Lepetit,et al.  Geometry-Aware Network for Non-rigid Shape Prediction from a Single View , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Danail Stoyanov,et al.  Stereoscopic Scene Flow for Robotic Assisted Minimally Invasive Surgery , 2012, MICCAI.

[28]  Anoop Cherian,et al.  Scalable Dense Non-rigid Structure-from-Motion: A Grassmannian Perspective , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Rui Yu,et al.  Direct, Dense, and Deformable: Template-Based Non-rigid 3D Reconstruction from RGB Video , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[31]  Chong-Ho Choi,et al.  A Procrustean Markov Process for Non-rigid Structure Recovery , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Lourdes Agapito,et al.  Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Adrien Bartoli,et al.  Coarse-to-fine low-rank structure-from-motion , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Aleix M. Martínez,et al.  Non-rigid structure from motion with complementary rank-3 spaces , 2011, CVPR 2011.

[35]  Lourdes Agapito,et al.  Dense Non-rigid Structure from Motion , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[36]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[37]  Iasonas Kokkinos,et al.  Lifting AutoEncoders: Unsupervised Learning of a Fully-Disentangled 3D Morphable Model Using Deep Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[38]  Lourdes Agapito,et al.  Online Dense Non-Rigid 3D Shape and Camera Motion Recovery , 2014, BMVC.

[39]  Didier Stricker,et al.  Structure from Articulated Motion: Accurate and Stable Monocular 3D Reconstruction without Training Data , 2019, Sensors.

[40]  Aaron Hertzmann,et al.  Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[43]  Simon Lucey,et al.  Deep Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Andrea Vedaldi,et al.  C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Patrick Pérez,et al.  MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Jitendra Malik,et al.  Learning Category-Specific Mesh Reconstruction from Image Collections , 2018, ECCV.

[47]  Pascal Fua,et al.  Laplacian Meshes for Monocular 3D Shape Recovery , 2012, ECCV.

[48]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[49]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[50]  Didier Stricker,et al.  Introduction to Coherent Depth Fields for Dense Monocular Surface Recovery , 2017, BMVC.

[51]  Olga Sorkine-Hornung,et al.  Laplacian Mesh Processing , 2005, Eurographics.

[52]  Didier Stricker,et al.  Scalable Dense Monocular Surface Reconstruction , 2017, 2017 International Conference on 3D Vision (3DV).

[53]  Pascal Fua,et al.  A constrained latent variable model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Francesc Moreno-Noguer,et al.  DUST: Dual Union of Spatio-Temporal Subspaces for Monocular Multiple Object 3D Reconstruction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Henning Biermann,et al.  Recovering non-rigid 3D shape from image streams , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[56]  Didier Stricker,et al.  Dense Batch Non-Rigid Structure from Motion in a Second , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[57]  Suryansh Kumar,et al.  Jumping Manifolds: Geometry Aware Dense Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Pascal Fua,et al.  Reconstructing sharply folding surfaces: A convex formulation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Simon Lucey,et al.  Complex Non-rigid Motion 3D Reconstruction by Union of Subspaces , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Francesc Moreno-Noguer,et al.  Mode-shape interpretation: Re-thinking modal space for recovering deformable shapes , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[61]  Didier Stricker,et al.  IsMo-GAN: Adversarial Learning for Monocular Non-Rigid 3D Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[62]  Aleix M. Martínez,et al.  Kernel non-rigid structure from motion , 2011, 2011 International Conference on Computer Vision.

[63]  Mingyi He,et al.  Dense non-rigid structure-from-motion made easy — A spatial-temporal smoothness based solution , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[64]  Didier Stricker,et al.  Occlusion-aware video registration for highly non-rigid objects , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[65]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[66]  Lourdes Agapito,et al.  A Variational Approach to Video Registration with Subspace Constraints , 2013, International Journal of Computer Vision.

[67]  Didier Stricker,et al.  Consolidating Segmentwise Non-Rigid Structure from Motion , 2019, 2019 16th International Conference on Machine Vision Applications (MVA).

[68]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[69]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Lourdes Agapito,et al.  Energy based multiple model fitting for non-rigid structure from motion , 2011, CVPR 2011.

[71]  Hongdong Li,et al.  A Simple Prior-Free Method for Non-rigid Structure-from-Motion Factorization , 2012, International Journal of Computer Vision.