Human Pose Estimation from Sparse Inertial Measurements through Recurrent Graph Convolution

We propose the adjacency adaptive graph convolutional long-short term memory network (AAGC-LSTM) for human pose estimation from sparse inertial measurements, obtained from only 6 measurement units. The AAGC-LSTM combines both spatial and temporal dependency in a single network operation. This is made possible by equipping graph convolutions with adjacency adaptivity, which also allows for learning unknown dependencies of the human body joints. To further boost accuracy, we propose longitudinal loss weighting to consider natural movement patterns, as well as body-aware contralateral data augmentation. By combining these contributions, we are able to utilize the inherent graph nature of the human body, and can thus outperform the state of the art for human pose estimation from sparse inertial measurements.

[1]  J. Collomosse,et al.  Real-Time Full-Body Motion Capture from Video and IMUs , 2017, 2017 International Conference on 3D Vision (3DV).

[2]  Xiao-Ming Wu,et al.  Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning , 2018, AAAI.

[3]  Pascal Fua,et al.  Learning Monocular 3D Human Pose Estimation from Multi-view Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[5]  Montse Pardàs,et al.  Marker-Based Human Motion Capture in Multiview Sequences , 2010, EURASIP J. Adv. Signal Process..

[6]  Michael J. Black,et al.  STAR: Sparse Trained Articulated Human Body Regressor , 2020, ECCV.

[7]  Hans-Peter Seidel,et al.  Real-Time Body Tracking with One Depth Camera and Inertial Sensors , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Bo Wang,et al.  Occlusion-Aware Networks for 3D Human Pose Estimation in Video , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Hans-Peter Seidel,et al.  Outdoor human motion capture using inverse kinematics and von mises-fisher sampling , 2011, 2011 International Conference on Computer Vision.

[10]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Wolfram Burgard,et al.  3D Human Pose Estimation in RGBD Images for Robotic Task Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Yu Liu,et al.  T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction , 2018, IEEE Transactions on Intelligent Transportation Systems.

[13]  Farhood Negin,et al.  An efficient human action recognition framework with pose-based spatiotemporal features , 2020, Engineering Science and Technology, an International Journal.

[14]  Yaliang Li,et al.  Simple and Deep Graph Convolutional Networks , 2020, ICML.

[15]  Bingbing Ni,et al.  Deep Kinematics Analysis for Monocular 3D Human Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Stepán Obdrzálek,et al.  Real-Time Human Pose Detection and Tracking for Tele-Rehabilitation in Virtual Reality , 2012, MMVR.

[17]  Huei-Yung Lin,et al.  Augmented Reality with Human Body Interaction Based on Monocular 3D Pose Estimation , 2010, ACIVS.

[18]  Zhengming Ding,et al.  3D Human Pose Estimation with Spatial and Temporal Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Lars Schwickert,et al.  Change of Objectively-Measured Physical Activity during Geriatric Rehabilitation , 2019, Sensors.

[20]  Michael J. Black,et al.  Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time , 2018 .

[21]  Nassir Navab,et al.  Patient MoCap: Human Pose Estimation Under Blanket Occlusion for Hospital Monitoring Applications , 2016, MICCAI.

[22]  Rafael E. Riveros,et al.  Studies in Health Technology and Informatics , 2005 .

[23]  Bodo Rosenhahn,et al.  Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs , 2017, Comput. Graph. Forum.

[24]  Nikolaus F. Troje,et al.  AMASS: Archive of Motion Capture As Surface Shapes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Nassir Navab,et al.  Discriminative Human Full-Body Pose Estimation from Wearable Inertial Sensor Data , 2009, 3DPH.

[26]  Nicole Gruber,et al.  Are GRU Cells More Specific and LSTM Cells More Sensitive in Motive Classification of Text? , 2020, Frontiers in Artificial Intelligence.

[27]  Hans-Peter Seidel,et al.  Motion reconstruction using sparse accelerometer data , 2011, TOGS.

[28]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[29]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[30]  Yingli Tian,et al.  Monocular human pose estimation: A survey of deep learning-based methods , 2020, Comput. Vis. Image Underst..

[31]  D. Roetenberg,et al.  Xsens MVN: Full 6DOF Human Motion Tracking Using Miniature Inertial Sensors , 2009 .

[32]  Saurabh Sharma,et al.  Monocular 3D Human Pose Estimation by Generation and Ordinal Ranking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Marius Leordeanu,et al.  Recurrent Space-time Graph Neural Networks , 2019, NeurIPS.

[34]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[35]  Mohammed Rabah,et al.  Human Pose Estimation-Based Real-Time Gait Analysis Using Convolutional Neural Network , 2020, IEEE Access.

[36]  Jessica K. Hodgins,et al.  Action capture with accelerometers , 2008, SCA '08.

[37]  Bodo Rosenhahn,et al.  Human Pose Estimation from Video and IMUs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Walter Daems,et al.  An Ultrasonic Six Degrees-of-Freedom Pose Estimation Sensor , 2017, IEEE Sensors Journal.

[39]  Yu Tian,et al.  Semantic Graph Convolutional Networks for 3D Human Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  Bodo Rosenhahn,et al.  Multisensor-fusion for 3D full-body human motion capture , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43]  Lina Yao,et al.  Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting , 2020, NeurIPS.

[44]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.