Dual Path Networks for Multi-Person Human Pose Estimation

The task of multi-person human pose estimation in natural scenes is quite challenging. Existing methods include both top-down and bottom-up approaches. The main advantage of bottom-up methods is its excellent tradeoff between estimation accuracy and computational cost. We follow this path and aim to design smaller, faster, and more accurate neural networks for the regression of keypoints and limb association vectors. These two regression tasks are naturally dependent on each other. In this work, we propose a dual-path network specially designed for multi-person human pose estimation, and compare our performance with the openpose network in aspects of model size, forward speed, and estimation accuracy.

[1]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[3]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Shimon Ullman,et al.  Using Linking Features in Learning Non-parametric Part Models , 2012, ECCV.

[8]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[10]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[13]  Bernt Schiele,et al.  DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[14]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Zhiao Huang,et al.  Associative Embedding: End-to-End Learning for Joint Detection and Grouping , 2016, NIPS.

[16]  Andrew Zisserman,et al.  Recurrent Human Pose Estimation , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[17]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[18]  Peng Wang,et al.  Joint Multi-person Pose Estimation and Semantic Part Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jonathan Tompson,et al.  Towards Accurate Multi-person Pose Estimation in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ngoc Thang Vu,et al.  Densely Connected Convolutional Networks for Speech Recognition , 2018, ITG Symposium on Speech Communication.

[22]  Silvio Savarese,et al.  Articulated part-based model for joint object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[23]  Shuicheng Yan,et al.  Dual Path Networks , 2017, NIPS.

[24]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Yuandong Tian,et al.  Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation , 2012, ECCV.

[26]  Luc Van Gool,et al.  Human Pose Estimation Using Body Parts Dependent Joint Regressors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Ben Taskar,et al.  MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Peter V. Gehler,et al.  Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[30]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.