A-HRNet: Attention Based High Resolution Network for Human pose estimation

Recently, human pose estimation has received much attention in the research community due to its broad range of application scenarios. Most architectures for human pose estimation use multiple resolution networks, such as Hourglass, CPN, HRNet, etc. High Resolution Network (HRNet) is the latest SOTA architecture improved from Hourglass. In this paper, we propose a novel attention block that leverages a special Channel-Attention branch. We use this attention block as the building block and adopt the architecture of HRNet to build our Attention Based HRNet (A-HRNet). Experiments show that our model can consistently outperform HRNet on different datasets. Moreover, our model achieves the state-of-the-art performance on the COCO keypoint detection val2017 dataset (77.7 AP)1.

[1]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  David Picard,et al.  2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[5]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Ning Zhang,et al.  iHear Food: Eating Detection Using Commodity Bluetooth Headsets , 2016, 2016 IEEE First International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE).

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Yichen Wei,et al.  Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.

[10]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Xiaogang Wang,et al.  Multi-context Attention for Human Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Chenxi Zhang,et al.  3D Anchor-Free Lesion Detector on Computed Tomography Scans , 2019, ArXiv.

[14]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Benyuan Liu,et al.  Human pose estimation based in-home lower body rehabilitation system , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[18]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[20]  Gang Yu,et al.  Cascaded Pyramid Network for Multi-person Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Li Fu,et al.  DNANet: De-Normalized Attention Based Multi-Resolution Network for Human Pose Estimation , 2019, ArXiv.

[22]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.