论文信息 - Deeply Learned Compositional Models for Human Pose Estimation

Deeply Learned Compositional Models for Human Pose Estimation

Compositional models represent patterns with hierarchies of meaningful parts and subparts. Their ability to characterize high-order relationships among body parts helps resolve low-level ambiguities in human pose estimation (HPE). However, prior compositional models make unrealistic assumptions on subpart-part relationships, making them incapable to characterize complex compositional patterns. Moreover, state spaces of their higher-level parts can be exponentially large, complicating both inference and learning. To address these issues, this paper introduces a novel framework, termed as Deeply Learned Compositional Model (DLCM), for HPE. It exploits deep neural networks to learn the compositionality of human bodies. This results in a novel network with a hierarchical compositional architecture and bottom-up/top-down inference stages. In addition, we propose a novel bone-based part representation. It not only compactly encodes orientations, scales and shapes of parts, but also avoids their potentially large state spaces. With significantly lower complexities, our approach outperforms state-of-the-art methods on three benchmark datasets.

[1] Ioannis A. Kakadiaris,et al. 3D Human pose estimation: A review of the literature and analysis of covariates , 2016, Comput. Vis. Image Underst..

[2] Dong Liu,et al. Human Pose Estimation Using Global and Local Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3] Xiaogang Wang,et al. End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Shimon Ullman,et al. Human Pose Estimation Using Deep Consensus Voting , 2016, ECCV.

[5] Li Wan,et al. End-to-end integration of a Convolutional Network, Deformable Parts Model and non-maximum suppression , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Yichen Wei,et al. Compositional Human Pose Regression , 2018, Comput. Vis. Image Underst..

[7] Stuart Geman,et al. Context and Hierarchy in a Probabilistic Image Model , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9] Jean Ponce,et al. A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[10] Yang Wang,et al. Learning hierarchical poselets for human parsing , 2011, CVPR 2011.

[11] Antonio Torralba,et al. Part and appearance sharing: Recursive Compositional Models for multi-view , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12] Peter V. Gehler,et al. DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Andrew Zisserman,et al. Recurrent Human Pose Estimation , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[14] Xiu-Shen Wei,et al. Adversarial PoseNet: A Structure-Aware Convolutional Network for Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15] Varun Ramakrishna,et al. Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Song-Chun Zhu,et al. Attribute And-Or Grammar for Joint Parsing of Human Pose, Parts and Attributes , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Navdeep Jaitly,et al. Chained Predictions Using Convolutional Neural Networks , 2016, ECCV.

[19] Ben Taskar,et al. MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Feng Zhou,et al. Deep Deformation Network for Object Landmark Localization , 2016, ECCV.

[21] David A. Forsyth,et al. Improved Human Parsing with a Full Relational Model , 2010, ECCV.

[22] Daniel P. Huttenlocher,et al. Distance Transforms of Sampled Functions , 2012, Theory Comput..

[23] Kun Duan,et al. A Multi-layer Composite Model for Human Pose Estimation , 2012, BMVC.

[24] Yuandong Tian,et al. Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation , 2012, ECCV.

[25] Jonathan Tompson,et al. Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[27] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[28] Xiaogang Wang,et al. Learning Feature Pyramids for Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29] Song-Chun Zhu,et al. Attributed Grammars for Joint Estimation of Human Attributes, Part and Pose , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30] Jia Deng,et al. Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[31] Jonathan Tompson,et al. Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[32] Long Zhu,et al. Recursive Compositional Models for Vision: Description and Review of Recent Work , 2011, Journal of Mathematical Imaging and Vision.

[33] Xiaogang Wang,et al. Multi-context Attention for Human Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Peiyun Hu,et al. Bottom-Up and Top-Down Reasoning with Hierarchical Rectified Gaussians , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Song-Chun Zhu,et al. Integrating Grammar and Segmentation for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36] Yu Zhou,et al. Human Pose Estimation Using Deep Structure Guided Learning , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[37] Alan L. Yuille,et al. Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[38] Georgios Tzimiropoulos,et al. Human Pose Estimation via Convolutional Part Heatmap Regression , 2016, ECCV.

[39] Mark Everingham,et al. Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[40] Alan L. Yuille,et al. Semantic part segmentation using compositional model combining shape and appearance , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Silvio Savarese,et al. Articulated part-based model for joint object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[42] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[43] Jean Ponce,et al. Computer Vision: A Modern Approach , 2002 .

[44] Mark Everingham,et al. Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[45] Bernt Schiele,et al. DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[46] Elie Bienenstock,et al. Compositionality, MDL Priors, and Object Recognition , 1996, NIPS.

[47] Kunihiko Fukushima,et al. Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Visual Pattern Recognition , 1982 .

[48] Bernt Schiele,et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[49] Jiahuan Zhou,et al. Towards a Unified Compositional Model for Visual Pattern Modeling , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).