论文信息 - Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation

Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation

Various deep learning techniques have been proposed to solve the single-view 2D-to-3D pose estimation problem. While the average prediction accuracy has been improved significantly over the years, the performance on hard poses with depth ambiguity, self-occlusion, and complex or rare poses is still far from satisfactory. In this work, we target these hard poses and present a novel skeletal GNN learning solution. To be specific, we propose a hop-aware hierarchical channel-squeezing fusion layer to effectively extract relevant information from neighboring nodes while suppressing undesired noises in GNN learning. In addition, we propose a temporal-aware dynamic graph construction procedure that is robust and effective for 3D pose estimation. Experimental results on the Human3.6M dataset show that our solution achieves 10.3% average prediction accuracy improvement and greatly improves on hard poses over state-of-the-art techniques. We further apply the proposed technique on the skeleton-based action recognition task and also achieve state-of-the-art performance. Our code is available at https://github. com/ailingzengzzz/Skeletal-GNN .

[1] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[2] Xiaopeng Hong,et al. Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching , 2019, AAAI.

[3] Ken-ichi Kawarabayashi,et al. Representation Learning on Graphs with Jumping Knowledge Networks , 2018, ICML.

[4] James J. Little,et al. Exploiting Temporal Information for 3D Human Pose Estimation , 2017, ECCV.

[5] Dahua Lin,et al. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[6] Louahdi Khoudour,et al. A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera , 2019, Sensors.

[7] Gang Yu,et al. Cascaded Pyramid Network for Multi-person Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9] Wei Tang,et al. Learning Global Pose Features in Graph Convolutional Networks for 3D Human Pose Estimation , 2020, ACCV.

[10] Xiaoxiao Li,et al. Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Dong Liu,et al. Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Haiping Lu,et al. Hop-Hop Relation-aware Graph Neural Networks , 2020, ArXiv.

[13] Pascal Fua,et al. Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision , 2016, 2017 International Conference on 3D Vision (3DV).

[14] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15] Xu Chen,et al. Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Matteo Matteucci,et al. Spatial Temporal Transformer Network for Skeleton-based Action Recognition , 2020, ICPR Workshops.

[17] Yifan Zhang,et al. Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition , 2020, ECCV.

[18] Nojun Kwak,et al. 3D Human Pose Estimation with Relational Networks , 2018, BMVC.

[19] Le Wang,et al. High-order Graph Convolutional Networks for 3D Human Pose Estimation , 2020, BMVC.

[20] David Picard,et al. 2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21] Dario Pavllo,et al. 3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Jingang Shi,et al. Mix Dimension in Poincaré Geometry for 3D Skeleton-based Action Recognition , 2020, ACM Multimedia.

[23] Gim Hee Lee,et al. Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation , 2019, BMVC.

[24] Gang Wang,et al. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Nanning Zheng,et al. Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Cristian Sminchisescu,et al. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Gang Wang,et al. NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Yizhou Wang,et al. Optimizing Network Structure for 3D Human Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29] Yongjun Xu,et al. Rethinking the Number of Channels for the Convolutional Neural Network , 2019, ArXiv.

[30] Song-Chun Zhu,et al. Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation , 2017, AAAI.

[31] James J. Little,et al. A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32] Ruiyuan Gao,et al. Hop-Aware Dimension Optimization for Graph Neural Networks , 2021, ArXiv.

[33] Kristina Lerman,et al. MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing , 2019, ICML.

[34] Yan Chen,et al. Generalizing Monocular 3D Human Pose Estimation in the Wild , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[35] Haitao Lin,et al. LookHops: light multi-order convolution and pooling for graph classification , 2020, ArXiv.

[36] Stephen Lin,et al. SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach , 2020, ECCV.

[37] Bernard Ghanem,et al. DeepGCNs: Can GCNs Go As Deep As CNNs? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38] Sanghoon Lee,et al. Propagating LSTM: 3D Pose Estimation Based on Joint Interdependency , 2018, ECCV.

[39] Yu Tian,et al. Semantic Graph Convolutional Networks for 3D Human Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Zhenghao Chen,et al. Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Nadia Magnenat-Thalmann,et al. Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42] Alan L. Yuille,et al. OriNet: A Fully Convolutional Network for 3D Human Pose Estimation , 2018, BMVC.

[43] Jia Deng,et al. Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[44] Wei Tang,et al. A Comprehensive Study of Weight Sharing in Graph Networks for 3D Human Pose Estimation , 2020, ECCV.

[45] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46] Jiahui Yu,et al. AutoSlim: Towards One-Shot Architecture Search for Channel Numbers , 2019 .

[47] Hans-Peter Seidel,et al. VNect , 2017, ACM Trans. Graph..

[48] Huiming Tang,et al. Dynamic GCN: Context-enriched Topology Learning for Skeleton-based Action Recognition , 2020, ACM Multimedia.

[49] Yifan Zhang,et al. Skeleton-Based Action Recognition With Shift Graph Convolutional Network , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Hailun Xia,et al. Multi-Scale Mixed Dense Graph Convolution Network for Skeleton-Based Action Recognition , 2021, IEEE Access.

[51] Yifan Zhang,et al. Skeleton-Based Action Recognition With Multi-Stream Adaptive Graph Convolutional Networks , 2019, IEEE Transactions on Image Processing.