Multi-Person Pose Estimation via Multi-Layer Fractal Network and Joints Kinship Pattern

We propose an effective method to boost the accuracy of multi-person pose estimation in images. Initially, the three-layer fractal network was constructed to regress multi-person joints location heatmap that can help to enhance an image region with receptive field and capture more joints local-contextual feature information, thereby producing keypoints heatmap intermediate prediction to optimize human body joints regression results. Subsequently, the hierarchical bi-directional inference algorithm was proposed to calculate the degree of relatedness (call it Kinship) for adjacent joints, and it combines the Kinship between adjacent joints with the spatial constraints, which we refer to as joints kinship pattern matching mechanism, to determine the best matched joints pair. We iterate the above-mentioned joints matching process layer by layer until all joints are assigned to a corresponding individual. Comprehensive experiments demonstrate that the proposed approach outperforms the state-of-the-art schemes and achieves about 1% and 0.6% increase in mAP on MPII multi-person subset and MSCOCO 2016 keypoints challenge.

[1]  Ran Xu,et al.  Combining Skeletal Pose with Local Motion for Human Activity Recognition , 2012, AMDO.

[2]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[3]  Georgios Tzimiropoulos,et al.  Human Pose Estimation via Convolutional Part Heatmap Regression , 2016, ECCV.

[4]  Bernt Schiele,et al.  DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[5]  Hantao Yao,et al.  Deep Representation Learning With Part Loss for Person Re-Identification , 2017, IEEE Transactions on Image Processing.

[6]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Bernt Schiele,et al.  Articulated people detection and pose estimation: Reshaping the future , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[11]  Juergen Gall,et al.  Multi-person Pose Estimation with Local Joint-to-Person Associations , 2016, ECCV Workshops.

[12]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[13]  Alan L. Yuille,et al.  Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[14]  Kang Zheng,et al.  Combining local appearance and holistic view: Dual-Source Deep Neural Networks for human pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jonathan Tompson,et al.  Learning Human Pose Estimation Features with Convolutional Networks , 2013, ICLR.

[16]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Xiaoqin Zhang,et al.  Efficient human pose estimation via parsing a tree structure based human model , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Andrew Zisserman,et al.  Recurrent Human Pose Estimation , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[21]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Jonathan Tompson,et al.  Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Dacheng Tao,et al.  Real Time Fine-Grained Categorization with Accuracy and Interpretability , 2016, ArXiv.

[24]  Bernt Schiele,et al.  ArtTrack: Articulated Multi-Person Tracking in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Gregory Shakhnarovich,et al.  FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[26]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Luc Van Gool,et al.  Efficient Non-Maximum Suppression , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[28]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[29]  Alan L. Yuille,et al.  Adaptive occlusion state estimation for human pose tracking under self-occlusions , 2013, Pattern Recognit..

[30]  Alan L. Yuille,et al.  An Approach to Pose-Based Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  David Zhang,et al.  A face and palmprint recognition approach based on discriminant DCT feature extraction , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[32]  Syed Abdul Rahman Abu-Bakar,et al.  Neural Network-Based Approach for Face Recognition Across Varying Pose , 2015, Int. J. Pattern Recognit. Artif. Intell..

[33]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[34]  Zhiao Huang,et al.  Associative Embedding: End-to-End Learning for Joint Detection and Grouping , 2016, NIPS.

[35]  Xiaogang Wang,et al.  Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jitendra Malik,et al.  Human Pose Estimation with Iterative Error Feedback , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Jonathan Tompson,et al.  MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation , 2014, ACCV.

[38]  Baowen Xu,et al.  Super-resolution Person re-identification with semi-coupled low-rank discriminant dictionary learning , 2015, CVPR.

[39]  Vittorio Ferrari,et al.  We Are Family: Joint Pose Estimation of Multiple Persons , 2010, ECCV.

[40]  Yi Li,et al.  Beyond Physical Connections: Tree Models in Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jonathan Tompson,et al.  Towards Accurate Multi-person Pose Estimation in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.