Procrustean Regression Networks: Learning 3D Structure of Non-Rigid Objects from 2D Annotations

We propose a novel framework for training neural networks which is capable of learning 3D information of non-rigid objects when only 2D annotations are available as ground truths. Recently, there have been some approaches that incorporate the problem setting of non-rigid structure-from-motion (NRSfM) into deep learning to learn 3D structure reconstruction. The most important difficulty of NRSfM is to estimate both the rotation and deformation at the same time, and previous works handle this by regressing both of them. In this paper, we resolve this difficulty by proposing a loss function wherein the suitable rotation is automatically determined. Trained with the cost function consisting of the reprojection error and the low-rank term of aligned shapes, the network learns the 3D structures of such objects as human skeletons and faces during the training, whereas the testing is done in a single-frame basis. The proposed method can handle inputs with missing entries and experimental results validate that the proposed framework shows superior reconstruction performance to the state-of-the-art method on the Human 3.6M, 300-VW, and SURREAL datasets, even though the underlying network structure is very simple.

[1]  Geonho Cha,et al.  Unsupervised 3D Reconstruction Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Songhwai Oh,et al.  Complex Non-rigid 3D Shape Recovery Using a Procrustean Normal Distribution Mixture Model , 2015, International Journal of Computer Vision.

[3]  Cordelia Schmid,et al.  Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[5]  Chong-Ho Choi,et al.  A Procrustean Markov Process for Non-rigid Structure Recovery , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Lourdes Agapito,et al.  Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Subhransu Maji,et al.  3D Shape Induction from 2D Views of Multiple Objects , 2016, 2017 International Conference on 3D Vision (3DV).

[8]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Minsik Lee,et al.  Procrustean Regression: A Flexible Alignment-Based Framework for Nonrigid Structure Estimation , 2018, IEEE Transactions on Image Processing.

[10]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[11]  Yuandong Tian,et al.  Single Image 3D Interpreter Network , 2016, ECCV.

[12]  J. Tenenbaum,et al.  MarrNet : 3 D Shape Reconstruction via 2 . 5 D Sketches , 2017 .

[13]  Xiaowei Zhou,et al.  3D Shape Reconstruction from 2D Landmarks: A Convex Formulation , 2014, ArXiv.

[14]  Takeo Kanade,et al.  Trajectory Space: A Dual Representation for Nonrigid Structure from Motion , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16]  Chong-Ho Choi,et al.  Procrustean Normal Distribution for Non-Rigid Structure from Motion , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Simon Lucey,et al.  Deep Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Andrea Vedaldi,et al.  C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Simon Lucey,et al.  Distill Knowledge From NRSfM for Weakly Supervised 3D Pose Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Yang Yang,et al.  Learning Category-Specific 3D Shape Models from Weakly Labeled 2D Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Jitendra Malik,et al.  Learning Category-Specific Mesh Reconstruction from Image Collections , 2018, ECCV.

[25]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Chen Kong,et al.  Prior-Less Compressible Structure from Motion , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Aaron Hertzmann,et al.  Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Adrien Bartoli,et al.  Coarse-to-fine low-rank structure-from-motion , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Henning Biermann,et al.  Recovering non-rigid 3D shape from image streams , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[31]  B. Hall Lie Groups, Lie Algebras, and Representations: An Elementary Introduction , 2004 .

[32]  David W. Jacobs,et al.  WarpNet: Weakly Supervised Matching for Single-View Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[34]  Jiajun Wu,et al.  MarrNet: 3D Shape Reconstruction via 2.5D Sketches , 2017, NIPS.

[35]  Aleix M. Martínez,et al.  Non-rigid structure from motion with complementary rank-3 spaces , 2011, CVPR 2011.

[36]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[37]  Stefanos Zafeiriou,et al.  The First Facial Landmark Tracking in-the-Wild Challenge: Benchmark and Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[38]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[39]  Lourdes Agapito,et al.  Factorization for non-rigid and articulated structure using metric projections , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Jitendra Malik,et al.  Learning Category-Specific Deformable 3D Models for Object Reconstruction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Hongdong Li,et al.  A Simple Prior-Free Method for Non-rigid Structure-from-Motion Factorization , 2012, International Journal of Computer Vision.

[42]  Francesc Moreno-Noguer,et al.  Force-Based Representation for Non-Rigid Shape and Elastic Model Estimation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Xiaowei Zhou,et al.  Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Thomas Brox,et al.  Multi-view 3D Models from Single Images with a Convolutional Network , 2015, ECCV.

[45]  Xiaowei Zhou,et al.  Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).