A Two Step Approach for Joint Detection

Human joints can be used in areas such as activity recognition, animation, augmented reality and robotics. Thus, there is a need to correctly detect these points. This paper proposes a two step method for joint detection. First an initial prediction is performed using the HRNet network. In the second step an improvement is applied on the initial predictions for each joint using a convolutional neural network. Tests were performed using the Unite the People dataset. The new model is able to improve the results of the base model by 1.81%.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Bernt Schiele,et al.  PoseTrack: A Benchmark for Human Pose Estimation and Tracking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Nojun Kwak,et al.  Pose estimator and tracker using temporal flow maps for limbs , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[7]  Dongdong Yu,et al.  Multi-person Pose Estimation for Pose Tracking with Enhanced Cascaded Pyramid Network , 2018, ECCV Workshops.

[8]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[9]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[10]  Ziqing Zhuang,et al.  [Measurement and analysis of human head-face dimensions]. , 2008, Zhonghua lao dong wei sheng zhi ye bing za zhi = Zhonghua laodong weisheng zhiyebing zazhi = Chinese journal of industrial hygiene and occupational diseases.

[11]  Rainer Stiefelhagen,et al.  Multiple Object Tracking Performance Metrics and Evaluation in a Smart Room Environment , 2006 .

[12]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[13]  Yichen Wei,et al.  Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.

[14]  Peter V. Gehler,et al.  Unite the People: Closing the Loop Between 3D and 2D Human Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Larry S. Davis,et al.  SNIPER: Efficient Multi-Scale Training , 2018, NeurIPS.

[16]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).