论文信息 - 3-D Human Pose Estimation Using Cascade of Multiple Neural Networks

3-D Human Pose Estimation Using Cascade of Multiple Neural Networks

Estimating three-dimensional (3-D) human poses from a given two-dimensional (2-D) shape is still an inherently ill-posed problem in computer vision. This paper proposes a method called cascade of multiple neural networks (CMNN) to solve this problem in following two steps: 1) create the initial estimated 3-D shape using the Zhou et al. method with a small number of basis shapes and 2) make this initial shape more alike to the original shape by using the CMNN. In comparing to existing works, the proposed method shows a significant outperformance in both accuracy and processing time. This paper also introduces a new system called Human3D that can estimate the 3-D pose of all people in a single RGB image. This system comprises two part: convolution pose machine (CPM) for estimating 2-D poses of all people in an RGB image and CMNN for reconstructing 3-D poses of them from outputs of the CPM.

Kang-Hyun Jo | Van-Thanh Hoang | K. Jo | Van-Thanh Hoang

[1] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Deva Ramanan,et al. Analyzing 3D Objects in Cluttered Images , 2012, NIPS.

[3] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[4] Michael J. Black,et al. Pose-conditioned joint angle limits for 3D human pose reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Camillo J. Taylor,et al. Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image , 2000, Comput. Vis. Image Underst..

[6] Yichen Wei,et al. Compositional Human Pose Regression , 2018, Comput. Vis. Image Underst..

[7] Zhi Zhang,et al. Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation , 2017, IEEE Transactions on Multimedia.

[8] Song Guo,et al. Big Data Meet Green Challenges: Big Data Toward Green Applications , 2016, IEEE Systems Journal.

[9] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[10] T. Kanade,et al. Reconstructing 3 D Human Pose from 2 D Image Landmarks , 2012 .

[11] Xiaowei Zhou,et al. Sparse Representation for 3D Shape Estimation: A Convex Relaxation Approach , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] A. Elgammal,et al. Inferring 3D body pose from silhouettes using activity manifold learning , 2004, CVPR 2004.

[13] Bernt Schiele,et al. Detailed 3D Representations for Object Recognition and Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Xiu-Shen Wei,et al. Adversarial PoseNet: A Structure-Aware Convolutional Network for Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15] Xiaowei Zhou,et al. Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Xiaowei Zhou,et al. Articulated motion estimation from a monocular image sequence using spherical tangent bundles , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[17] Michal Niedzwiecki,et al. Hand Body Language Gesture Recognition Based on Signals From Specialized Glove and Machine Learning Algorithms , 2016, IEEE Transactions on Industrial Informatics.

[18] Michael J. Black,et al. Predicting 3D People from 2D Pictures , 2006, AMDO.

[19] Mohan M. Trivedi,et al. 3-D Posture and Gesture Recognition for Interactivity in Smart Spaces , 2012, IEEE Transactions on Industrial Informatics.

[20] Jonathan Kofman,et al. Teleoperation of a robot manipulator using a vision-based human-robot interface , 2005, IEEE Transactions on Industrial Electronics.

[21] Yang Yi,et al. Reservoir Computing Meets Smart Grids: Attack Detection Using Delayed Feedback Networks , 2018, IEEE Transactions on Industrial Informatics.

[22] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[23] Van-Dung Hoang,et al. An improved method for 3D shape estimation using cascade of neural networks , 2017, 2017 IEEE 15th International Conference on Industrial Informatics (INDIN).

[24] Fernando De la Torre,et al. Spatio-temporal Matching for Human Detection in Video , 2014, ECCV.

[25] Xiaogang Wang,et al. Learning Feature Pyramids for Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26] Youjie Zhou,et al. Pose Locality Constrained Representation for 3D Human Pose Reconstruction , 2014, ECCV.

[27] Jia Deng,et al. Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[28] Song Guo,et al. Big Data Meet Green Challenges: Greening Big Data , 2016, IEEE Systems Journal.

[29] Alexei A. Efros,et al. Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Ankur Agarwal,et al. Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31] James J. Little,et al. A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32] Michael J. Black,et al. Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[33] Josephine Sullivan,et al. One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Xiaogang Wang,et al. Multi-context Attention for Human Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Wen Gao,et al. Robust Estimation of 3D Human Poses from a Single Image , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36] Cristian Sminchisescu,et al. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37] Jian Sun,et al. Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[38] T. Kanade,et al. Reconstructing 3D Human Pose from 2D Image Landmarks , 2012, ECCV.

[39] Hai Zhao,et al. Wearable Continuous Body Temperature Measurement Using Multiple Artificial Neural Networks , 2018, IEEE Transactions on Industrial Informatics.

[40] Deva Ramanan,et al. 3D Human Pose Estimation = 2D Pose Estimation + Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Bernt Schiele,et al. Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42] Pietro Perona,et al. Cascaded pose regression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43] Bernt Schiele,et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[44] Silvio Savarese,et al. Estimating the aspect layout of object categories , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[45] Timothy F. Cootes,et al. Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..