论文信息 - Mobile. Egocentric Human Body Motion Reconstruction Using Only Eyeglasses-mounted Cameras and a Few Body-worn Inertial Sensors

Mobile. Egocentric Human Body Motion Reconstruction Using Only Eyeglasses-mounted Cameras and a Few Body-worn Inertial Sensors

We envision a convenient telepresence system available to users anywhere, anytime. Such a system requires displays and sensors embedded in commonly worn items such as eyeglasses, wristwatches, and shoes. To that end, we present a standalone real-time system for the dynamic 3D capture of a person, relying only on cameras embedded into a head-worn device, and on Inertial Measurement Units (IMUs) worn on the wrists and ankles. Our prototype system egocentrically reconstructs the wearer's motion via learning-based pose estimation, which fuses inputs from visual and inertial sensors that complement each other, overcoming challenges such as inconsistent limb visibility in head-worn views, as well as pose ambiguity from sparse IMUs. The estimated pose is continuously re-targeted to a prescanned surface model, resulting in a high-fidelity 3D reconstruction. We demonstrate our system by reconstructing various human body movements and show that our visual-inertial learning-based method, which runs in real time, outperforms both visual-only and inertial-only approaches. We captured an egocentric visual-inertial 3D human pose dataset publicly available at https://sites.google.com/site/youngwooncha/egovip for training and evaluating similar methods.

[1] Zhen He,et al. Numerical Coordinate Regression with Convolutional Neural Networks , 2018, ArXiv.

[2] Mao Ye,et al. Fast Human Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Michael J. Black,et al. SMPL: A Skinned Multi-Person Linear Model , 2023 .

[4] Michael J. Black,et al. STAR: Sparse Trained Articulated Human Body Regressor , 2020, ECCV.

[5] Ken Sakurada,et al. OpenVSLAM: A Versatile Visual SLAM Framework , 2019, ACM Multimedia.

[6] Ming Yang,et al. Instance-level Human Parsing via Part Grouping Network , 2018, ECCV.

[7] Simon Lucey,et al. Learning Depth from Monocular Videos Using Direct Methods , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8] Qionghai Dai,et al. SimulCap : Single-View Human Performance Capture With Cloth Simulation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Bodo Rosenhahn,et al. Supplementary Material to: Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera , 2018 .

[10] Bodo Rosenhahn,et al. Human Pose Estimation from Video and IMUs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Charles Malleson,et al. Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors , 2017, BMVC.

[12] Kristen Grauman,et al. Seeing Invisible Poses: Estimating 3D Body Pose from Egocentric Video , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Hans-Peter Seidel,et al. Motion reconstruction using sparse accelerometer data , 2011, TOGS.

[15] Cordelia Schmid,et al. BodyNet: Volumetric Inference of 3D Human Body Shapes , 2018, ECCV.

[16] Bodo Rosenhahn,et al. Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs , 2017, Comput. Graph. Forum.

[17] Peter V. Gehler,et al. Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[18] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[19] Christian Theobalt,et al. LiveCap , 2018, ACM Trans. Graph..

[20] Pascal Fua,et al. Mo2Cap2: Real-time Mobile 3D Motion Capture with a Cap-mounted Fisheye Camera , 2018, IEEE Transactions on Visualization and Computer Graphics.

[21] Christian Theobalt,et al. DeepCap: Monocular Human Performance Capture Using Weak Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Michael J. Black,et al. VIBE: Video Inference for Human Body Pose and Shape Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Yaser Sheikh,et al. Monocular Total Capture: Posing Face, Body, and Hands in the Wild , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Jan-Michael Frahm,et al. Towards Fully Mobile 3D Face, Body, and Environment Capture Using Only Head-worn Cameras , 2018, IEEE Transactions on Visualization and Computer Graphics.

[25] Ruben Villegas,et al. Learning to Generate Long-term Future via Hierarchical Prediction , 2017, ICML.

[26] Yaser Sheikh,et al. Motion capture from body-mounted cameras , 2011, ACM Trans. Graph..

[27] Jia Deng,et al. Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[28] Yichen Wei,et al. Integral Human Pose Regression , 2017, ECCV.

[29] J. Collomosse,et al. Real-Time Full-Body Motion Capture from Video and IMUs , 2017, 2017 International Conference on 3D Vision (3DV).

[30] Lourdes Agapito,et al. xR-EgoPose: Egocentric 3D Human Pose From an HMD Camera , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31] Abhishek Sharma,et al. Learning 3D Human Pose from Structure and Motion , 2017, ECCV.

[32] Danica Kragic,et al. Deep Representation Learning for Human Motion Prediction and Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Iasonas Kokkinos,et al. DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34] Hans-Peter Seidel,et al. VNect , 2017, ACM Trans. Graph..

[35] Michael J. Black,et al. Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time , 2018 .

[36] Da-Yuan Huang,et al. Cyclops: Wearable and Single-Piece Full-Body Gesture Input Devices , 2015, CHI.

[37] Bernt Schiele,et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38] Pascal Fua,et al. Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision , 2016, 2017 International Conference on 3D Vision (3DV).

[39] Michael J. Black,et al. On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Bo Wang,et al. Occlusion-Aware Networks for 3D Human Pose Estimation in Video , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41] Dimitrios Tzionas,et al. Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Hans-Peter Seidel,et al. EgoCap , 2016, ACM Trans. Graph..

[43] Pascal Fua,et al. XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera , 2019, ACM Trans. Graph..