Sequential Articulated Motion Reconstruction from a Monocular Image Sequence

In this article, we present a sequential approach for articulated motion estimation from a 2D skeleton sequence. This is a challenging task due to the complexity of human movements and the inherent depth ambiguities. The proposed approach models the human movement on a kinematic manifold with the tangent bundle, which is a natural geometrical representation of articulated motion. Combined with a second-order stochastic dynamic model based on the Markov hypothesis, we generalize the Extended Rauch Tung Striebel smoother to a Riemannian manifold to simulate the process of human movement. The human motor system might violate the Markov hypothesis when the human body is subject to external forces, and therefore a refinement stage is introduced to correct the estimation error. Specifically, the current estimation is refined in a feasible solution region consisting of a set of local estimations. This region is called a simplex, in which each element can be represented by a convex hull of all ingredients. We have proved that the refinement problem can be converted into a convex optimization problem with the simplicial constraint. Since the proposed formulation conforms to the principles of kinematic and spatio-temporal continuity of articulated motion, the reconstruction ambiguity can be alleviated essentially. The performance of the proposed algorithm is conducted on multiple synthetic sequences from the CMU and the HDM05 MoCap databases. The results show that, without requiring any training data, the proposed approach achieves greater accuracy over state-of-the-art baselines. Furthermore, the proposed approach outperforms two baselines on real sequences from the Human3.6m MoCap database.

[1]  Henning Biermann,et al.  Recovering non-rigid 3D shape from image streams , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[2]  Xiaolin K. Wei,et al.  VideoMocap: modeling physically realistic human motion from monocular video sequences , 2010, ACM Trans. Graph..

[3]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[4]  Adrien Bartoli,et al.  Implicit Non-Rigid Structure-from-Motion with Priors , 2008, Journal of Mathematical Imaging and Vision.

[5]  Francesc Moreno-Noguer,et al.  Single image 3D human pose estimation from noisy observations , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Seah Hock Soon,et al.  3D Human motion tracking by exemplar-based conditional particle filter , 2015, Signal Process..

[7]  Hujun Bao,et al.  A Robust Tracking System for Low Frame Rate Video , 2015, International Journal of Computer Vision.

[8]  Deva Ramanan,et al.  N-best maximal decoders for part models , 2011, 2011 International Conference on Computer Vision.

[9]  Jean-Christophe Nebel,et al.  Human pose tracking in low dimensional space enhanced by limb correction , 2011, 2011 18th IEEE International Conference on Image Processing.

[10]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[11]  Simon Lucey,et al.  3D motion reconstruction for real-world camera motion , 2011, CVPR 2011.

[12]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Bodo Rosenhahn,et al.  3D Reconstruction of Human Motion from Monocular Image Sequences , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[15]  F. Xavier Roca,et al.  Action-specific motion prior for efficient Bayesian 3D human body tracking , 2009, Pattern Recognit..

[16]  Cristian Sminchisescu,et al.  Kinematic jump processes for monocular 3D human tracking , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[17]  Sigmundur Gudmundsson,et al.  On the geometry of tangent bundles , 2002 .

[18]  Aaron Hertzmann,et al.  Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Fiora Pirri,et al.  Bayesian Image Based 3D Pose Estimation , 2016, ECCV.

[20]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Cornelius T. Leondes,et al.  Nonlinear Smoothing Theory , 1970, IEEE Trans. Syst. Sci. Cybern..

[22]  Xiaowei Zhou,et al.  3D Shape Reconstruction from 2D Landmarks: A Convex Formulation , 2014, ArXiv.

[23]  David J. Fleet,et al.  Human attributes from 3D pose tracking , 2010, Comput. Vis. Image Underst..

[24]  Tido Röder,et al.  Documentation Mocap Database HDM05 , 2007 .

[25]  Xiaowei Zhou,et al.  Sparse Representation for 3D Shape Estimation: A Convex Relaxation Approach , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Vincent Lepetit,et al.  Direct Prediction of 3D Body Poses from Motion Compensated Sequences , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Stephen P. Boyd,et al.  Graph Implementations for Nonsmooth Convex Programs , 2008, Recent Advances in Learning and Control.

[28]  Trevor Darrell,et al.  Inferring 3D structure with a statistical image-based shape model , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[29]  Liya Ding,et al.  Modelling and recognition of the linguistic components in American Sign Language , 2009, Image Vis. Comput..

[30]  Wen Gao,et al.  Robust Estimation of 3D Human Poses from a Single Image , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Sridha Sridharan,et al.  Efficient Articulated Trajectory Reconstruction Using Dynamic Programming and Filters , 2012, ECCV.

[32]  Luc Van Gool,et al.  Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities , 2011, NIPS.

[33]  Michael Isard,et al.  Loose-limbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation , 2011, International Journal of Computer Vision.

[34]  Hongdong Li,et al.  A Simple Prior-Free Method for Non-rigid Structure-from-Motion Factorization , 2012, International Journal of Computer Vision.

[35]  David J. Fleet,et al.  Physics-Based Person Tracking Using Simplified Lower-Body Dynamics , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[37]  Jing Xiao,et al.  Non-rigid shape and motion recovery: degenerate deformations , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[38]  Zaïd Harchaoui,et al.  Fast and Robust Archetypal Analysis for Representation Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Roberto Cipolla,et al.  Real-time tracking of highly articulated structures in the presence of noisy measurements , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[40]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[41]  Xiaogang Wang,et al.  End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Odest Chadwicke Jenkins,et al.  Dynamical Simulation Priors for Human Motion Tracking , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  T. Kanade,et al.  Reconstructing 3D Human Pose from 2D Image Landmarks , 2012, ECCV.

[44]  Leonid Sigal Human Pose Estimation , 2014, Computer Vision, A Reference Guide.

[45]  Michael J. Black,et al.  Pose-conditioned joint angle limits for 3D human pose reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Francesc Moreno-Noguer,et al.  Lie Algebra-Based Kinematic Prior for 3D Human Pose Tracking , 2015, 2015 14th IAPR International Conference on Machine Vision Applications (MVA).

[48]  Jakub M. Tomczak,et al.  Articulated tracking with manifold regularized particle filter , 2016, Machine Vision and Applications.

[49]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[51]  Chen Kong,et al.  Prior-Less Compressible Structure from Motion , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Sebastian Nowozin,et al.  Efficient Nonlinear Markov Models for Human Motion , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Baoxin Li,et al.  Learning Motion Correlation for Tracking Articulated Human Body with a Rao-Blackwellised Particle Filter , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[54]  Ming-Hsuan Yang,et al.  Estimating Human Pose from Occluded Images , 2009, ACCV.

[55]  Francesc Moreno-Noguer,et al.  3D Human Pose Tracking Priors using Geodesic Mixture Models , 2017, International Journal of Computer Vision.

[56]  Francesc Moreno-Noguer,et al.  Sequential Non-Rigid Structure from Motion Using Physical Priors , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Fernando De la Torre,et al.  Canonical locality preserving Latent Variable Model for discriminative pose inference , 2013, Image Vis. Comput..

[58]  Yaser Sheikh,et al.  3D reconstruction of a smooth articulated trajectory from a monocular image sequence , 2011, 2011 International Conference on Computer Vision.

[59]  Jinxiang Chai,et al.  Modeling 3D human poses from uncalibrated monocular images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[60]  Takeo Kanade,et al.  Trajectory Space: A Dual Representation for Nonrigid Structure from Motion , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Ehud Rivlin,et al.  3D human tracking with gaussian process annealed particle filter , 2007, VISAPP.

[62]  Cordelia Schmid,et al.  Mixing Body-Part Sequences for Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Juergen Gall,et al.  3D Pose Estimation from a Single Monocular Image , 2015, ArXiv.

[64]  Xiaowei Zhou,et al.  Articulated motion estimation from a monocular image sequence using spherical tangent bundles , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[65]  Michael J. Black,et al.  "Continuous-state Graphical Models for Object Localization, Pose Estimation and Tracking" , 2008 .

[66]  Yale Song,et al.  Continuous body and hand gesture recognition for natural human-computer interaction , 2012, TIIS.

[67]  David J. Fleet,et al.  Dynamical binary latent variable models for 3D human pose tracking , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[68]  Song-Chun Zhu,et al.  Joint action recognition and pose estimation from video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Xianghua Xie,et al.  Tracking 3D human pose with large root node uncertainty , 2011, CVPR 2011.

[70]  Mohammed Bennamoun,et al.  A Gaussian Process Guided Particle Filter for Tracking 3D Human Pose in Video , 2013, IEEE Transactions on Image Processing.