论文信息 - Learning Pose Grammar for Monocular 3 D Pose Estimation

Learning Pose Grammar for Monocular 3 D Pose Estimation

In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation from a monocular RGB image. Our model takes estimated 2D pose as the input and learns a generalized 2D-3D mapping function to leverage into 3D pose. The proposed model consists of a base network which efficiently captures pose-aligned features and a hierarchy of Bi-directional RNNs (BRNNs) on the top to explicitly incorporate a set of knowledge regarding human body configuration (i.e., kinematics, symmetry, motor coordination). The proposed model thus enforces high-level constraints over human poses. In learning, we develop a data augmentation algorithm to further improve model robustness against appearance variations and cross-view generalization ability. We validate our method on public 3D human pose benchmarks and propose a new evaluation protocol working on cross-view setting to verify the generalization capability of different methods. We empirically observe that most state-of-the-art methods encounter difficulty under such setting while our method can well handle such challenges.

[1] András Lörincz,et al. 3D Human Pose Estimation with Siamese Equivariant Embedding , 2018, Neurocomputing.

[2] Zhen He,et al. 3D Human Pose Estimation With 2D Marginal Heatmaps , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[3] Song-Chun Zhu,et al. Learning Human-Object Interactions by Graph Parsing Neural Networks , 2018, ECCV.

[4] Song-Chun Zhu,et al. Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image , 2018, ECCV.

[5] Song-Chun Zhu,et al. Attribute And-Or Grammar for Joint Parsing of Human Pose, Parts and Attributes , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Song-Chun Zhu,et al. Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7] Pascal Fua,et al. Unsupervised Geometry-Aware Representation for 3D Human Pose Estimation , 2018, ECCV.

[8] Xiaogang Wang,et al. 3D Human Pose Estimation in the Wild by Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9] Song-Chun Zhu,et al. Single-View 3D Scene Reconstruction and Parsing by Attribute Grammar , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10] David Picard,et al. 2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11] James J. Little,et al. Exploiting Temporal Information for 3D Human Pose Estimation , 2017, ECCV.

[12] Song-Chun Zhu,et al. Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation , 2017, AAAI.

[13] Song-Chun Zhu,et al. A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14] Tao Yuan,et al. Scene-Centric Joint Parsing of Cross-View Videos , 2017, AAAI.

[15] Yichen Wei,et al. Compositional Human Pose Regression , 2018, Comput. Vis. Image Underst..

[16] Song-Chun Zhu,et al. Monocular 3D Human Pose Estimation by Predicting Depth on Joints , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17] Nanning Zheng,et al. Modeling 4D Human-Object Interactions for Joint Event Segmentation, Recognition, and Object Localization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] James J. Little,et al. A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19] Yanxi Liu,et al. Beyond Planar Symmetry: Modeling Human Perception of Reflection and Rotation Symmetries in the Wild , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20] Yichen Wei,et al. Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21] Lourdes Agapito,et al. Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Deva Ramanan,et al. 3D Human Pose Estimation = 2D Pose Estimation + Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Francesc Moreno-Noguer,et al. 3D Human Pose Estimation from a Single Image via Distance Matrix Regression , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Xiaowei Zhou,et al. Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Pascal Fua,et al. Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[26] Xiaowei Zhou,et al. Sparse Representation for 3D Shape Estimation: A Convex Relaxation Approach , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Song-Chun Zhu,et al. Cross-View People Tracking by Scene-Centered Spatio-Temporal Parsing , 2017, AAAI.

[28] Ioannis A. Kakadiaris,et al. 3D Human pose estimation: A review of the literature and analysis of covariates , 2016, Comput. Vis. Image Underst..

[29] Fiora Pirri,et al. Bayesian Image Based 3D Pose Estimation , 2016, ECCV.

[30] Mohan S. Kankanhalli,et al. Marker-Less 3D Human Motion Capture with Monocular Image Sequence and Height-Maps , 2016, ECCV.

[31] Peter V. Gehler,et al. Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[32] Cordelia Schmid,et al. MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild , 2016, NIPS.

[33] Vincent Lepetit,et al. Structured Prediction of 3D Human Pose with Deep Neural Networks , 2016, BMVC.

[34] Michael S. Ryoo,et al. Learning Social Affordance for Human-Robot Interaction , 2016, IJCAI.

[35] Jia Deng,et al. Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[36] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Xiaowei Zhou,et al. Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Vincent Lepetit,et al. Direct Prediction of 3D Body Poses from Motion Compensated Sequences , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Andrew Zisserman,et al. Personalizing Human Video Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Juergen Gall,et al. A Dual-Source Approach for 3D Pose Estimation from a Single Image , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Antoni B. Chan,et al. Maximum-Margin Structured Learning with Deep Networks for 3D Human Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42] Michael J. Black,et al. Pose-conditioned joint angle limits for 3D human pose reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[44] Niloy J. Mitra,et al. Creating consistent scene graphs using a probabilistic grammar , 2014, ACM Trans. Graph..

[45] Antoni B. Chan,et al. 3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network , 2014, ACCV.

[46] Cristian Sminchisescu,et al. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47] Bernt Schiele,et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[48] Wen Gao,et al. Robust Estimation of 3D Human Poses from a Single Image , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[49] Nassir Navab,et al. 3D Pictorial Structures for Multiple Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[50] Ilya Kostrikov,et al. Depth Sweep Regression Forests for Estimating 3D Human Pose from Images , 2014, BMVC.

[51] Roland Göcke,et al. Monocular Image 3D Human Pose Estimation under Self-Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[52] Yi Yang,et al. Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53] Martin A. Giese,et al. Kinematics of the Coordination of Pointing during Locomotion , 2013, PloS one.

[54] Francesc Moreno-Noguer,et al. A Joint Model for 2D and 3D Pose Estimation from a Single Image , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[55] Song-Chun Zhu,et al. Integrating Grammar and Segmentation for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[56] Derek Lura,et al. Inverse kinematics of a bilateral robotic human upper body model based on motion analysis data , 2013, 2013 IEEE International Conference on Robotics and Automation.

[57] Vittorio Murino,et al. Symmetry-driven accumulation of local features for human characterization and re-identification , 2013, Comput. Vis. Image Underst..

[58] T. Kanade,et al. Reconstructing 3D Human Pose from 2D Image Landmarks , 2012, ECCV.

[59] Francesc Moreno-Noguer,et al. Single image 3D human pose estimation from noisy observations , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[60] Andre Schiele,et al. A functional anatomy based kinematic human hand model with simple size adaptation , 2012, 2012 IEEE International Conference on Robotics and Automation.

[61] Michael Isard,et al. Loose-limbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation , 2011, International Journal of Computer Vision.

[62] Hao Jiang. 3D Human Pose Reconstruction Using Millions of Exemplars , 2010, 2010 20th International Conference on Pattern Recognition.

[63] Luc Van Gool,et al. Computational Symmetry in Computer Vision and Computer Graphics , 2010, Found. Trends Comput. Graph. Vis..

[64] David J. Fleet,et al. Dynamical binary latent variable models for 3D human pose tracking , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[65] Mark Everingham,et al. Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[66] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67] Michael J. Black,et al. HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[68] Cristian Sminchisescu,et al. Twin Gaussian Processes for Structured Prediction , 2010, International Journal of Computer Vision.

[69] Katsu Yamane,et al. Robot Kinematics and Dynamics for Modeling the Human Body , 2007, ISRR.

[70] Sanja Fidler,et al. Hierarchical Statistical Learning of Generic Parts of Object Structure , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[71] Tamar Flash,et al. Motor primitives in vertebrates and invertebrates , 2005, Current Opinion in Neurobiology.

[72] Daniel P. Huttenlocher,et al. Beyond trees: common-factor models for 2D human pose recovery , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[73] Daniel P. Huttenlocher,et al. Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[74] Trevor Darrell,et al. Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[75] J. W. Valentine,et al. Defining phyla: evolutionary pathways to metazoan body plans , 2001, Evolution & development.

[76] Daniel P. Huttenlocher,et al. Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[77] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[78] Geoffrey K. Pullum,et al. Generalized Phrase Structure Grammar , 1985 .

[79] O. Firschein,et al. Syntactic pattern recognition and applications , 1983, Proceedings of the IEEE.

[80] Haim Gaifman,et al. Dependency Systems and Phrase-Structure Systems , 1965, Inf. Control..

[81] D. G. Hays. Dependency Theory: A Formalism and Some Observations , 1964 .