Spatio-temporal MLP-graph network for 3D human pose estimation

Graph convolutional networks and their variants have shown significant promise in 3D human pose estimation. Despite their success, most of these methods only consider spatial correlations between body joints and do not take into account temporal correlations, thereby limiting their ability to capture relationships in the presence of occlusions and inherent ambiguity. To address this potential weakness, we propose a spatio-temporal network architecture composed of a joint-mixing multi-layer perceptron block that facilitates communication among different joints and a graph weighted Jacobi network block that enables communication among various feature channels. The major novelty of our approach lies in a new weighted Jacobi feature propagation rule obtained through graph filtering with implicit fairing. We leverage temporal information from the 2D pose sequences, and integrate weight modulation into the model to enable untangling of the feature transformations of distinct nodes. We also employ adjacency modulation with the aim of learning meaningful correlations beyond defined linkages between body joints by altering the graph topology through a learnable modulation matrix. Extensive experiments on two benchmark datasets demonstrate the effectiveness of our model, outperforming recent state-of-the-art methods for 3D human pose estimation.

[1]  U. Kressel,et al.  MotionMixer: MLP-based 3D Human Body Pose Forecasting , 2022, IJCAI.

[2]  Runwei Ding,et al.  GraphMLP: A Graph MLP-Like Architecture for 3D Human Pose Estimation , 2022, ArXiv.

[3]  Weiqiang Wang,et al.  GraFormer: Graph-oriented Transformer for 3D Pose Estimation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Wongun Choi,et al.  Ray3D: ray-based 3D human pose estimation for monocular absolute 3D localization , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  A. Ben Hamza,et al.  Higher-Order Implicit Fairing Networks for 3D Human Pose Estimation , 2021, BMVC.

[6]  Wei Tang,et al.  Modulated Graph Convolutional Network for 3D Human Pose Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Nanxuan Zhao,et al.  Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Zhijun Liang,et al.  A Graph Attention Spatio-temporal Convolutional Network for 3D Human Pose Estimation in Video , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Jiashi Feng,et al.  PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  A. Dosovitskiy,et al.  MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.

[11]  Tao Mei,et al.  Recent Advances in Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective , 2021, ArXiv.

[12]  Zhengming Ding,et al.  3D Human Pose Estimation with Spatial and Temporal Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Kevin Lin,et al.  End-to-End Human Pose and Mesh Reconstruction with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Sohrob Kazerounian,et al.  GraphMDN: Leveraging graph structure and deep learning to solve inverse problems , 2020, 2021 International Joint Conference on Neural Networks (IJCNN).

[15]  Wei Tang,et al.  A Comprehensive Study of Weight Sharing in Graph Networks for 3D Human Pose Estimation , 2020, ECCV.

[16]  Stephen Lin,et al.  SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach , 2020, ECCV.

[17]  Nadia Magnenat-Thalmann,et al.  Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Yizhou Wang,et al.  Optimizing Network Structure for 3D Human Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Chen Li,et al.  Generating Multiple Hypotheses for 3D Human Pose Estimation With Mixture Density Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yu Tian,et al.  Semantic Graph Convolutional Networks for 3D Human Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Christian Theobalt,et al.  In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Saurabh Sharma,et al.  Monocular 3D Human Pose Estimation by Generation and Ordinal Ranking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Junsong Yuan,et al.  3D Hand Shape and Pose Estimation From a Single RGB Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Dario Pavllo,et al.  3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Sanghoon Lee,et al.  Propagating LSTM: 3D Pose Estimation Based on Joint Interdependency , 2018, ECCV.

[26]  Xiaowei Zhou,et al.  Ordinal Depth Supervision for 3D Human Pose Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Xiaogang Wang,et al.  3D Human Pose Estimation in the Wild by Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Xiao-Ming Wu,et al.  Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning , 2018, AAAI.

[29]  James J. Little,et al.  Exploiting Temporal Information for 3D Human Pose Estimation , 2017, ECCV.

[30]  Gang Yu,et al.  Cascaded Pyramid Network for Multi-person Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Song-Chun Zhu,et al.  Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation , 2017, AAAI.

[32]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Pascal Fua,et al.  Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision , 2016, 2017 International Conference on 3D Vision (3DV).

[34]  Xiaowei Zhou,et al.  Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[36]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016, 1606.08415.

[37]  Antoni B. Chan,et al.  3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network , 2014, ACCV.

[38]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Wen Gao,et al.  Robust Estimation of 3D Human Poses from a Single Image , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[41]  Mark Meyer,et al.  Implicit fairing of irregular meshes using diffusion and curvature flow , 1999, SIGGRAPH.

[42]  Jae Yung Lee,et al.  Multi-hop Modulated Graph Convolutional Networks for 3D Human Pose Estimation , 2022, BMVC.

[43]  Zijian Zhang Group Graph Convolutional Networks for 3D Human Pose Estimation , 2022, BMVC.

[44]  Le Wang,et al.  High-order Graph Convolutional Networks for 3D Human Pose Estimation , 2020, BMVC.