论文信息 - Self-Supervised Human Depth Estimation From Monocular Videos

Self-Supervised Human Depth Estimation From Monocular Videos

Previous methods on estimating detailed human depth often require supervised training with ‘ground truth’ depth data. This paper presents a self-supervised method that can be trained on YouTube videos without known depth, which makes training data collection simple and improves the generalization of the learned network. The self-supervised learning is achieved by minimizing a photo-consistency loss, which is evaluated between a video frame and its neighboring frames warped according to the estimated depth and the 3D non-rigid motion of the human body. To solve this non-rigid motion, we first estimate a rough SMPL model at each video frame and compute the non-rigid body motion accordingly, which enables self-supervised learning on estimating the shape details. Experiments demonstrate that our method enjoys better generalization, and performs much better on data in the wild.

[1] Dieter Fox,et al. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Oisin Mac Aodha,et al. Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[4] Sebastian Thrun,et al. SCAPE: shape completion and animation of people , 2005, SIGGRAPH '05.

[5] Noah Snavely,et al. Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[7] Jitendra Malik,et al. Learning 3D Human Dynamics From Video , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Ignas Budvytis,et al. Indirect deep structured learning for 3D human body shape and pose prediction , 2017, BMVC.

[9] Ali Farhadi,et al. Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks , 2016, ECCV.

[10] Iasonas Kokkinos,et al. DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11] William T. Freeman,et al. Learning the Depths of Moving People by Watching Frozen People , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Zhichao Yin,et al. GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13] Dimitrios Tzionas,et al. Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Bolei Zhou,et al. Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Kostas Daniilidis,et al. Convolutional Mesh Regression for Single-Image Human Shape Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Jörg Stückler,et al. Semi-Supervised Deep Learning for Monocular Depth Map Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Hao Li,et al. PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18] Jitendra Malik,et al. End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[20] Peter V. Gehler,et al. Unite the People: Closing the Loop Between 3D and 2D Human Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Qionghai Dai,et al. DoubleFusion: Real-Time Capture of Human Performances with Inner Body Shapes from a Single Depth Sensor , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22] Markus H. Gross,et al. HS-Nets: Estimating Human Body Shape from Silhouettes with Convolutional Neural Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[23] Marcus A. Magnor,et al. Tex2Shape: Detailed Full Human Body Geometry From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24] Xiaowei Zhou,et al. Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Ersin Yumer,et al. Self-supervised Learning of Motion Capture , 2017, NIPS.

[26] Christian Theobalt,et al. Multi-Garment Net: Learning to Dress 3D People From Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27] Cordelia Schmid,et al. BodyNet: Volumetric Inference of 3D Human Body Shapes , 2018, ECCV.

[28] Olaf Kähler,et al. Very High Frame Rate Volumetric Integration of Depth Images on Mobile Devices , 2015, IEEE Transactions on Visualization and Computer Graphics.

[29] Kostas Daniilidis,et al. TexturePose: Supervising Human Mesh Estimation With Texture Consistency , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30] Peter V. Gehler,et al. Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation , 2018, 2018 International Conference on 3D Vision (3DV).

[31] Peter V. Gehler,et al. Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[32] Matthias Nießner,et al. VolumeDeform: Real-Time Volumetric Non-rigid Reconstruction , 2016, ECCV.

[33] Ruigang Yang,et al. Detailed Human Shape Estimation From a Single Image by Hierarchical Mesh Deformation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Gustavo Carneiro,et al. Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[35] Tao Yu,et al. DeepHuman: 3D Human Reconstruction From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36] Simon Lucey,et al. Learning Depth from Monocular Videos Using Direct Methods , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37] Jia Deng,et al. Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[38] Jitendra Malik,et al. Human Pose Estimation with Iterative Error Feedback , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Xiaowei Zhou,et al. Learning to Estimate 3D Human Pose and Shape from a Single Color Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40] Michael J. Black,et al. Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[41] Ira Kemelmacher-Shlizerman,et al. Soccer on Your Tabletop , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42] Hans-Peter Seidel,et al. VNect , 2017, ACM Trans. Graph..

[43] Martial Hebert,et al. Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency , 2019, ArXiv.

[44] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Avinash Sharma,et al. Deep Textured 3D Reconstruction of Human Bodies , 2018, BMVC.

[47] Yichen Wei,et al. Integral Human Pose Regression , 2017, ECCV.

[48] Iasonas Kokkinos,et al. HoloPose: Holistic 3D Human Reconstruction In-The-Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49] James J. Little,et al. A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[50] Ruigang Yang,et al. View Extrapolation of Human Body from a Single Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51] Hao Li,et al. SiCloPe: Silhouette-Based Clothed People , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Zhaoyang Li,et al. A Neural Network for Detailed Human Depth Estimation From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).