论文信息 - Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation

Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation

In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation. Our model directly takes 2D pose as input and learns a generalized 2D-3D mapping function. The proposed model consists of a base network which efficiently captures pose-aligned features and a hierarchy of Bi-directional RNNs (BRNN) on the top to explicitly incorporate a set of knowledge regarding human body configuration (i.e., kinematics, symmetry, motor coordination). The proposed model thus enforces high-level constraints over human poses. In learning, we develop a pose sample simulator to augment training samples in virtual camera views, which further improves our model generalizability. We validate our method on public 3D human pose benchmarks and propose a new evaluation protocol working on cross-view setting to verify the generalization capability of different methods. We empirically observe that most state-of-the-art methods encounter difficulty under such setting while our method can well handle such challenges.

[1] Mohan S. Kankanhalli,et al. Marker-Less 3D Human Motion Capture with Monocular Image Sequence and Height-Maps , 2016, ECCV.

[2] Song-Chun Zhu,et al. Image Parsing with Stochastic Scene Grammar , 2011, NIPS.

[3] Feng Han,et al. Bottom-Up/Top-Down Image Parsing with Attribute Grammar , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Peter K. Allen,et al. Articulated Pose Estimation Using Hierarchical Exemplar-Based Models , 2016, AAAI.

[5] Peter V. Gehler,et al. Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[6] Cristian Sminchisescu,et al. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Trevor Darrell,et al. Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8] Francesc Moreno-Noguer,et al. 3D Human Pose Estimation from a Single Image via Distance Matrix Regression , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Deva Ramanan,et al. 3D Human Pose Estimation = 2D Pose Estimation + Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Cordelia Schmid,et al. MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild , 2016, NIPS.

[11] Bingpeng Ma,et al. Person Search in a Scene by Jointly Modeling People Commonness and Person Uniqueness , 2014, ACM Multimedia.

[12] Bernt Schiele,et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Fiora Pirri,et al. Bayesian Image Based 3D Pose Estimation , 2016, ECCV.

[14] Antoni B. Chan,et al. 3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network , 2014, ACCV.

[15] Yichen Wei,et al. Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16] Song-Chun Zhu,et al. Monocular 3D Human Pose Estimation by Predicting Depth on Joints , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17] Liang Lin,et al. Human Re-identification by Matching Compositional Template with Cluster Sampling , 2013, 2013 IEEE International Conference on Computer Vision.

[18] Francesc Moreno-Noguer,et al. A Joint Model for 2D and 3D Pose Estimation from a Single Image , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Zhuowen Tu,et al. Deeply-Supervised Nets , 2014, AISTATS.

[20] Michael J. Black,et al. HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[21] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[22] Song-Chun Zhu,et al. Attribute And-Or Grammar for Joint Parsing of Human Pose, Parts and Attributes , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Vincent Lepetit,et al. Direct Prediction of 3D Body Poses from Motion Compensated Sequences , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Xiaowei Zhou,et al. Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Jun Zhu,et al. Pose-Guided Human Parsing by an AND/OR Graph Using Pose-Context Features , 2016, AAAI.

[27] Andrew Zisserman,et al. Personalizing Human Video Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Ilya Kostrikov,et al. Depth Sweep Regression Forests for Estimating 3D Human Pose from Images , 2014, BMVC.

[29] James J. Little,et al. A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30] Jia Deng,et al. Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[31] Hao Jiang. 3D Human Pose Reconstruction Using Millions of Exemplars , 2010, 2010 20th International Conference on Pattern Recognition.

[32] Yang Liu,et al. Multi-view People Tracking via Hierarchical Trajectory Composition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Kobus Barnard,et al. Understanding Bayesian Rooms Using Composite 3D Object Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Niloy J. Mitra,et al. Creating consistent scene graphs using a probabilistic grammar , 2014, ACM Trans. Graph..

[35] Vincent Lepetit,et al. Structured Prediction of 3D Human Pose with Deep Neural Networks , 2016, BMVC.

[36] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37] Juergen Gall,et al. A Dual-Source Approach for 3D Pose Estimation from a Single Image , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).