Cross Refinement Techniques for Markerless Human Motion Capture

This article presents a global 3D human pose estimation method for markerless motion capture. Given two calibrated images of a person, it first obtains the 2D joint locations in the images using a pre-trained 2D Pose CNN, then constructs the 3D pose based on stereo triangulation. To improve the accuracy and the stability of the system, we propose two efficient optimization techniques for the joints. The first one, called cross-view refinement, optimizes the joints based on epipolar geometry. The second one, called cross-joint refinement, optimizes the joints using bone-length constraints. Our method automatically detects and corrects the unreliable joint, and consequently is robust against heavy occlusion, symmetry ambiguity, motion blur, and highly distorted poses. We evaluate our method on a number of benchmark datasets covering indoors and outdoors, which showed that our method is better than or on par with the state-of-the-art methods. As an application, we create a 3D human pose dataset using the proposed motion capture system, which contains about 480K images of both indoor and outdoor scenes, and demonstrate the usefulness of the dataset for human pose estimation.

[1]  Antoni B. Chan,et al.  3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network , 2014, ACCV.

[2]  Stefan Carlsson,et al.  3D Pictorial Structures for Multiple View Articulated Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Michael J. Black,et al.  Pose-conditioned joint angle limits for 3D human pose reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Vincent Lepetit,et al.  Direct Prediction of 3D Body Poses from Motion Compensated Sequences , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Michael J. Black,et al.  Guest Editorial: State of the Art in Image- and Video-Based Human Pose and Motion Estimation , 2010, International Journal of Computer Vision.

[6]  Nassir Navab,et al.  3D Pictorial Structures for Multiple Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Fiora Pirri,et al.  Bayesian Image Based 3D Pose Estimation , 2016, ECCV.

[8]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[9]  Tiejun Huang,et al.  Sequential Deep Trajectory Descriptor for Action Recognition With Three-Stream CNN , 2016, IEEE Transactions on Multimedia.

[10]  Antoni B. Chan,et al.  Maximum-Margin Structured Learning with Deep Networks for 3D Human Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Wen Gao,et al.  Robust Estimation of 3D Human Poses from a Single Image , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Bodo Rosenhahn,et al.  Joint 3D Human Motion Capture and Physical Analysis from Monocular Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Yong Su,et al.  Sequential Articulated Motion Reconstruction from a Monocular Image Sequence , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[14]  Xiaogang Wang,et al.  3D Human Pose Estimation in the Wild by Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Pascal Fua,et al.  Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision , 2016, 2017 International Conference on 3D Vision (3DV).

[16]  Jitendra Malik,et al.  Human Pose Estimation with Iterative Error Feedback , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Song-Chun Zhu,et al.  Learning Knowledge-guided Pose Grammar Machine for 3D Human Pose Estimation , 2017, ArXiv.

[18]  Deva Ramanan,et al.  3D Human Pose Estimation = 2D Pose Estimation + Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Emre Akbas,et al.  Self-Supervised Learning of 3D Human Pose Using Multi-View Geometry , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Luc Van Gool,et al.  Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities , 2011, NIPS.

[21]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[22]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[23]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[25]  Xiaowei Zhou,et al.  3D Shape Reconstruction from 2D Landmarks: A Convex Formulation , 2014, ArXiv.

[26]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[27]  Jie Li,et al.  Bottom-up Pose Estimation of Multiple Person with Bounding Box Constraint , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[28]  Pascal Fua,et al.  Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  David J. Fleet,et al.  Dynamical binary latent variable models for 3D human pose tracking , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Juergen Gall,et al.  A Dual-Source Approach for 3D Pose Estimation from a Single Image , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Yichen Wei,et al.  Compositional Human Pose Regression , 2018, Comput. Vis. Image Underst..

[32]  Bernt Schiele,et al.  Multi-view Pictorial Structures for 3D Human Pose Estimation , 2013, BMVC.

[33]  Taku Komura,et al.  Interactive partner control in close interactions for real-time applications , 2013, TOMCCAP.

[34]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Andrew Zisserman,et al.  Flowing ConvNets for Human Pose Estimation in Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Bo Zhao,et al.  AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding , 2017, ArXiv.

[38]  Xiaowei Zhou,et al.  Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Yichen Wei,et al.  Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Xiaowei Zhou,et al.  Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Christoph Schnörr,et al.  A Study of Parts-Based Object Class Detection Using Complete Graphs , 2010, International Journal of Computer Vision.

[42]  Wei Zhang,et al.  Deep Kinematic Pose Regression , 2016, ECCV Workshops.

[43]  Michael Isard,et al.  Loose-limbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation , 2011, International Journal of Computer Vision.

[44]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[46]  Fernando De la Torre,et al.  Spatio-temporal Matching for Human Detection in Video , 2014, ECCV.

[47]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Xiaowei Zhou,et al.  Harvesting Multiple Views for Marker-Less 3D Human Pose Annotations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Miaopeng Li,et al.  Multi-Person Pose Estimation Using Bounding Box Constraint and LSTM , 2019, IEEE Transactions on Multimedia.

[50]  Bodo Rosenhahn,et al.  A Kinematic Chain Space for Monocular Motion Capture , 2017, ECCV Workshops.

[51]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[52]  Edilson de Aguiar,et al.  MARCOnI—ConvNet-Based MARker-Less Motion Capture in Outdoor and Indoor Scenes , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Jonathan Tompson,et al.  Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[55]  Hossein Azizpour,et al.  Multi-view Body Part Recognition with Random Forests , 2013, BMVC.

[56]  Petr Sojka,et al.  Gait Recognition from Motion Capture Data , 2017, ACM Trans. Multim. Comput. Commun. Appl..

[57]  Bodo Rosenhahn,et al.  3D Reconstruction of Human Motion from Monocular Image Sequences , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[59]  Alan L. Yuille,et al.  Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[60]  Alvaro Marcos-Ramiro,et al.  Let Your Body Speak: Communicative Cue Extraction on Natural Interaction Using RGBD Data , 2015, IEEE Transactions on Multimedia.

[61]  T. Kanade,et al.  Reconstructing 3D Human Pose from 2D Image Landmarks , 2012, ECCV.

[62]  Vincent Lepetit,et al.  Structured Prediction of 3D Human Pose with Deep Neural Networks , 2016, BMVC.

[63]  Chen Qian,et al.  Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Yen-Lin Chen,et al.  3D Reconstruction of Human Motion and Skeleton from Uncalibrated Monocular Video , 2009, ACCV.

[65]  Nassir Navab,et al.  3D Pictorial Structures Revisited: Multiple Human Pose Estimation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.