Fast Online Upper Body Pose Estimation from Video

Estimation of human body poses from video is an important problem in computer vision with many applications. Most existing methods for video pose estimation are offline in nature, where all frames in the video are used in the process to estimate the body pose in each frame. In this work, we describe a fast online video upper body pose estimation method (CDBN-MODEC) that is based on a conditional dynamic Bayesian network model, which predicts upper body pose in a frame without using information from future frames. Our method combines fast single image based pose estimation methods with the temporal correlation of poses between frames. We collect a new high frame rate upper body pose dataset that better reflects practical scenarios calling for fast online video pose estimation. When evaluated on this dataset and the VideoPose2 benchmark dataset, CDBN-MODEC achieves improvements in both performance and running efficiency over several state-of-art online video pose estimation methods.

[1]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[2]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4]  Vittorio Ferrari,et al.  Appearance Sharing for Collective Human Pose Estimation , 2012, ACCV.

[5]  Andrew Zisserman,et al.  2D Human Pose Estimation in TV Shows , 2009, Statistical and Geometrical Approaches to Visual Motion Analysis.

[6]  Ben Taskar,et al.  MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  David A. Forsyth,et al.  Strike a pose: tracking people by finding stylized poses , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Ben Taskar,et al.  Cascaded Models for Articulated Pose Estimation , 2010, ECCV.

[9]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[11]  Ben Taskar,et al.  Dynamic Structured Model Selection , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Jonathan Tompson,et al.  MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation , 2014, ACCV.

[13]  Alan L. Yuille,et al.  Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[14]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[15]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Cordelia Schmid,et al.  Estimating Human Pose with Flowing Puppets , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[18]  Seunghoon Hong,et al.  Joint Segmentation and Pose Tracking of Human in Natural Videos , 2013, 2013 IEEE International Conference on Computer Vision.

[19]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Ben Taskar,et al.  Parsing human motion with stretchable models , 2011, CVPR 2011.

[21]  Cordelia Schmid,et al.  Mixing Body-Part Sequences for Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Katerina Fragkiadaki,et al.  Pose from Flow and Flow from Pose , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.