Temporally Consistent 3D Human Pose Estimation Using Dual 360° Cameras

This paper presents a 3D human pose estimation system that uses a stereo pair of 360° sensors to capture the complete scene from a single location. The approach combines the advantages of omnidirectional capture, the accuracy of multiple view 3D pose estimation and the portability of monocular acquisition. Joint monocular belief maps for joint locations are estimated from 360° images and are used to fit a 3D skeleton to each frame. Temporal data association and smoothing is performed to produce accurate 3D pose estimates throughout the sequence. We evaluate our system on the Panoptic Studio dataset, as well as real 360° video for tracking multiple people, demonstrating an average Mean Per Joint Position Error of 12.47cm with 30cm baseline cameras. We also demonstrate improved capabilities over perspective and 360° multi-view systems when presented with limited camera views of the subject.

[1]  Victor Lempitsky,et al.  Learnable Triangulation of Human Pose , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Lourdes Agapito,et al.  Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Adrian Hilton,et al.  Human-Centric Scene Understanding from Single View 360 Video , 2018, 2018 International Conference on 3D Vision (3DV).

[4]  Christian Theobalt,et al.  MonoPerfCap , 2017, ACM Trans. Graph..

[5]  J. Gower Generalized procrustes analysis , 1975 .

[6]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[7]  Wenjun Zeng,et al.  Fusing Wearable IMUs With Multi-View Images for Human Pose Estimation: A Geometric Approach , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Wojciech Matusik,et al.  Articulated mesh animation from multi-view silhouettes , 2008, ACM Trans. Graph..

[9]  Hans-Peter Seidel,et al.  Free-viewpoint video of human actors , 2003, ACM Trans. Graph..

[10]  Dario Pavllo,et al.  3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Hans-Peter Seidel,et al.  Markerless motion capture of interacting characters using multi-view image segmentation , 2011, CVPR 2011.

[12]  A. Hilton,et al.  3D Human Pose Estimation From Multi Person Stereo 360 Scenes , 2019, CVPR Workshops.

[13]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[14]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Yinghao Huang,et al.  Towards Accurate Marker-Less Human Shape and Pose Estimation over Time , 2017, 2017 International Conference on 3D Vision (3DV).

[16]  Pascal Fua,et al.  Mo2Cap2: Real-time Mobile 3D Motion Capture with a Cap-mounted Fisheye Camera , 2018, IEEE Transactions on Visualization and Computer Graphics.

[17]  Christian Theobalt,et al.  DeepCap: Monocular Human Performance Capture Using Weak Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Adrian Hilton,et al.  Deep Convolutional Networks for Marker-less Human Pose Estimation from Multiple Views , 2016, CVMP 2016.

[20]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[21]  Bingbing Ni,et al.  Deep Kinematics Analysis for Monocular 3D Human Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Charles Malleson,et al.  Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors , 2017, BMVC.

[23]  J. Collomosse,et al.  Real-Time Full-Body Motion Capture from Video and IMUs , 2017, 2017 International Conference on 3D Vision (3DV).

[24]  Takeo Kanade,et al.  Panoptic Studio: A Massively Multiview System for Social Motion Capture , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Kostas Daniilidis,et al.  Convolutional Mesh Regression for Single-Image Human Shape Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Hans-Peter Seidel,et al.  Performance capture from sparse multi-view video , 2008, ACM Trans. Graph..

[28]  Bodo Rosenhahn,et al.  Human Pose Estimation from Video and IMUs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.