论文信息 - Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints

Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints

We introduce the first dense neural non-rigid structure from motion (N-NRSfM) approach, which can be trained end-to-end in an unsupervised manner from 2D point tracks. Compared to the competing methods, our combination of loss functions is fully-differentiable and can be readily integrated into deep-learning systems. We formulate the deformation model by an auto-decoder and impose subspace constraints on the recovered latent space function in a frequency domain. Thanks to the state recurrence cue, we classify the reconstructed non-rigid surfaces based on their similarity and recover the period of the input sequence. Our N-NRSfM approach achieves competitive accuracy on widely-used benchmark sequences and high visual quality on various real videos. Apart from being a standalone technique, our method enables multiple applications including shape compression, completion and interpolation, among others. Combined with an encoder trained directly on 2D images, we perform scenario-specific monocular 3D shape reconstruction at interactive frame rates. To facilitate the reproducibility of the results and boost the new research direction, we open-source our code and provide trained models for research purposes.

[1] Didier Stricker,et al. Accurate 3D Reconstruction of Dynamic Scenes from Monocular Image Sequences with Severe Occlusions , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[2] Francesc Moreno-Noguer,et al. A scalable, efficient, and accurate solution to non-rigid structure from motion , 2018, Comput. Vis. Image Underst..

[3] Wei Liu,et al. Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[4] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[5] Richard Szeliski,et al. A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6] Richard A. Newcombe,et al. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Chong-Ho Choi,et al. Procrustean Normal Distribution for Non-Rigid Structure from Motion , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Takeo Kanade,et al. Trajectory Space: A Dual Representation for Nonrigid Structure from Motion , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Mathieu Aubry,et al. A Papier-Mache Approach to Learning 3D Surface Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10] Hans-Peter Seidel,et al. FML: Face Model Learning From Videos , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Jing Xiao,et al. A Closed-Form Solution to Non-Rigid Shape and Motion Recovery , 2004, International Journal of Computer Vision.

[12] Didier Stricker,et al. NRSfM-Flow: Recovering Non-Rigid Scene Flow from Monocular Image Sequences , 2016, BMVC.

[13] Lourdes Agapito,et al. Soft Inextensibility Constraints for Template-Free Non-rigid Reconstruction , 2012, ECCV.

[14] Francesc Moreno-Noguer,et al. Global Model with Local Interpretation for Dynamic Shape Reconstruction , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[15] Mathieu Aubry,et al. AtlasNet: A Papier-M\^ach\'e Approach to Learning 3D Surface Generation , 2018, CVPR 2018.

[16] Alessio Del Bue,et al. A factorization approach to structure from motion with shape priors , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Francesc Moreno-Noguer,et al. Force-Based Representation for Non-Rigid Shape and Elastic Model Estimation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Hans-Peter Seidel,et al. Lightweight binocular facial performance capture under uncontrolled lighting , 2012, ACM Trans. Graph..

[20] Alessio Del Bue,et al. Optimal Metric Projections for Deformable and Articulated Structure-from-Motion , 2011, International Journal of Computer Vision.

[21] Antonis A. Argyros,et al. Patch-Based Reconstruction of a Textureless Deformable 3D Surface from a Single RGB Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[22] Didier Stricker,et al. Intrinsic Dynamic Shape Prior for Fast, Sequential and Dense Non-Rigid Structure from Motion with Detection of Temporally-Disjoint Rigidity , 2019, ArXiv.

[23] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24] Didier Stricker,et al. HDM-Net: Monocular Non-Rigid 3D Reconstruction with Learned Deformation Model , 2018, EuroVR.

[25] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[26] Vincent Lepetit,et al. Geometry-Aware Network for Non-rigid Shape Prediction from a Single View , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27] Danail Stoyanov,et al. Stereoscopic Scene Flow for Robotic Assisted Minimally Invasive Surgery , 2012, MICCAI.

[28] Anoop Cherian,et al. Scalable Dense Non-rigid Structure-from-Motion: A Grassmannian Perspective , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29] Rui Yu,et al. Direct, Dense, and Deformable: Template-Based Non-rigid 3D Reconstruction from RGB Video , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30] Silvio Savarese,et al. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[31] Chong-Ho Choi,et al. A Procrustean Markov Process for Non-rigid Structure Recovery , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Lourdes Agapito,et al. Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[33] Adrien Bartoli,et al. Coarse-to-fine low-rank structure-from-motion , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Aleix M. Martínez,et al. Non-rigid structure from motion with complementary rank-3 spaces , 2011, CVPR 2011.

[35] Lourdes Agapito,et al. Dense Non-rigid Structure from Motion , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[36] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[37] Iasonas Kokkinos,et al. Lifting AutoEncoders: Unsupervised Learning of a Fully-Disentangled 3D Morphable Model Using Deep Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[38] Lourdes Agapito,et al. Online Dense Non-Rigid 3D Shape and Camera Motion Recovery , 2014, BMVC.

[39] Didier Stricker,et al. Structure from Articulated Motion: Accurate and Stable Monocular 3D Reconstruction without Training Data , 2019, Sensors.

[40] Aaron Hertzmann,et al. Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41] Sebastian Nowozin,et al. Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[43] Simon Lucey,et al. Deep Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44] Andrea Vedaldi,et al. C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45] Patrick Pérez,et al. MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46] Jitendra Malik,et al. Learning Category-Specific Mesh Reconstruction from Image Collections , 2018, ECCV.

[47] Pascal Fua,et al. Laplacian Meshes for Monocular 3D Shape Recovery , 2012, ECCV.

[48] Matthew Turk,et al. A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[49] Karl Pearson F.R.S.. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[50] Didier Stricker,et al. Introduction to Coherent Depth Fields for Dense Monocular Surface Recovery , 2017, BMVC.

[51] Olga Sorkine-Hornung,et al. Laplacian Mesh Processing , 2005, Eurographics.

[52] Didier Stricker,et al. Scalable Dense Monocular Surface Reconstruction , 2017, 2017 International Conference on 3D Vision (3DV).

[53] Pascal Fua,et al. A constrained latent variable model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[54] Francesc Moreno-Noguer,et al. DUST: Dual Union of Spatio-Temporal Subspaces for Monocular Multiple Object 3D Reconstruction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55] Henning Biermann,et al. Recovering non-rigid 3D shape from image streams , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[56] Didier Stricker,et al. Dense Batch Non-Rigid Structure from Motion in a Second , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[57] Suryansh Kumar,et al. Jumping Manifolds: Geometry Aware Dense Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58] Pascal Fua,et al. Reconstructing sharply folding surfaces: A convex formulation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[59] Simon Lucey,et al. Complex Non-rigid Motion 3D Reconstruction by Union of Subspaces , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[60] Francesc Moreno-Noguer,et al. Mode-shape interpretation: Re-thinking modal space for recovering deformable shapes , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[61] Didier Stricker,et al. IsMo-GAN: Adversarial Learning for Monocular Non-Rigid 3D Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[62] Aleix M. Martínez,et al. Kernel non-rigid structure from motion , 2011, 2011 International Conference on Computer Vision.

[63] Mingyi He,et al. Dense non-rigid structure-from-motion made easy — A spatial-temporal smoothness based solution , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[64] Didier Stricker,et al. Occlusion-aware video registration for highly non-rigid objects , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[65] Takeo Kanade,et al. Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[66] Lourdes Agapito,et al. A Variational Approach to Video Registration with Subspace Constraints , 2013, International Journal of Computer Vision.

[67] Didier Stricker,et al. Consolidating Segmentwise Non-Rigid Structure from Motion , 2019, 2019 16th International Conference on Machine Vision Applications (MVA).

[68] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[69] Hao Su,et al. A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70] Lourdes Agapito,et al. Energy based multiple model fitting for non-rigid structure from motion , 2011, CVPR 2011.

[71] Hongdong Li,et al. A Simple Prior-Free Method for Non-rigid Structure-from-Motion Factorization , 2012, International Journal of Computer Vision.