Hope: heatmap and offset for pose estimation

The progress on human pose estimation by deep neural networks has been significantly advanced in recent years. However, the problem of precision loss caused by the prediction of the coordinates back to the original image has been neglected. In this paper, we propose a simple but effective method using Heatmap and Offset for Pose Estimation (HOPE). In order to solve the human pose estimation problem, firstly a general top-down method is used in HOPE to generate the human detection box based on a detector, and then the keypoints in each cropped box image are located. To alleviate the precision loss of mapping process, HOPE embeds the coordinate offset into the structure of the neural network, allowing the network to self-learn the slight offset in the mapping process in an end-to-end manner, which improves the accuracy in the current field of pose estimation. Experimental results on the multi-person pose estimation dataset MSCOCO, the single-person pose estimation dataset MPII and CrowdPose Pose Estimation dataset indicate that our method achieves state-of-the-art performance in terms of accuracy and computational complexity.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Daijin Kim,et al.  Accurate Human Pose Estimation by Aggregating Multiple Pose Hypotheses Using Modified Kernel Density Approximation , 2015, IEEE Signal Processing Letters.

[3]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[4]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Khashayar Mehrany,et al.  A fast bottom-up approach toward three-dimensional human pose estimation using an array of cameras , 2017 .

[6]  Jing-Ming Guo,et al.  Multi-Person Pose Estimation via Multi-Layer Fractal Network and Joints Kinship Pattern , 2019, IEEE Transactions on Image Processing.

[7]  HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Rongsheng Dong,et al.  DenseU-Net-Based Semantic Segmentation of Small Objects in Urban Remote Sensing Images , 2019, IEEE Access.

[9]  Jonathan Tompson,et al.  Towards Accurate Multi-person Pose Estimation in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Andrew Zisserman,et al.  Recurrent Human Pose Estimation , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[11]  Bernt Schiele,et al.  DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[12]  Hao Zhu,et al.  CrowdPose: Efficient Crowded Scenes Pose Estimation and a New Benchmark , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Luc Van Gool,et al.  Efficient Non-Maximum Suppression , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[14]  Xiangyu Zhang,et al.  Learning Delicate Local Representations for Multi-Person Pose Estimation , 2020, ECCV.

[15]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Faranak Shamsafar,et al.  Uniting holistic and part-based attitudes for accurate and robust deep human pose estimation , 2020 .

[17]  Yi Jiang,et al.  Supervised locality discriminant manifold learning for head pose estimation , 2014, Knowl. Based Syst..

[18]  C. V. Jawahar,et al.  Human pose search using deep networks , 2017, Image Vis. Comput..

[19]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Trevor Darrell,et al.  Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Xiaoxiao Li,et al.  Deep Learning Markov Random Field for Semantic Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Yichen Wei,et al.  Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.

[25]  Kun Zou,et al.  Application of human body gesture recognition algorithm based on deep learning in non-contact human body measurement , 2020 .

[26]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[27]  Harish Bhaskar,et al.  People detection and articulated pose estimation framework for crowded scenes , 2017, Knowl. Based Syst..

[28]  Peng Duan,et al.  Multi-person pose estimation based on a deep convolutional neural network , 2019, J. Vis. Commun. Image Represent..

[29]  Yichen Wei,et al.  Compositional Human Pose Regression , 2018, Comput. Vis. Image Underst..

[30]  Jean-Christophe Nebel,et al.  Integration of bottom-up/top-down approaches for 2D pose estimation using probabilistic Gaussian modelling , 2011, Comput. Vis. Image Underst..

[31]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Qing Zhang,et al.  Multi-level and multi-scale deep saliency network for salient object detection , 2019, J. Vis. Commun. Image Represent..

[33]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[36]  Zhenxue Chen,et al.  Fast Semantic Segmentation for Scene Perception , 2019, IEEE Transactions on Industrial Informatics.

[37]  Gang Yu,et al.  Cascaded Pyramid Network for Multi-person Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Jianrong Tan,et al.  A survey on 3D hand pose estimation: Cameras, methods, and datasets , 2019, Pattern Recognit..

[39]  Yuxing Tang,et al.  Visual and Semantic Knowledge Transfer for Large Scale Semi-Supervised Object Detection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Chun Chen,et al.  A survey of human pose estimation: The body parts parsing based methods , 2015, J. Vis. Commun. Image Represent..

[41]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[42]  Nanning Zheng,et al.  A Limb-Based Graphical Model for Human Pose Estimation , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[43]  Truong Q. Nguyen,et al.  Random Forest With Learned Representations for Semantic Segmentation , 2019, IEEE Transactions on Image Processing.

[44]  Luiz Velho,et al.  Tensorpose: Real-time pose estimation for interactive applications , 2019, Comput. Graph..