Finding Human Poses in Videos Using Concurrent Matching and Segmentation

We propose a novel method to detect human poses in videos by concurrently optimizing body part matching and object segmentation. With a single exemplar image, the proposed method detects the poses of a specific human subject in long video sequences. Matching and segmentation support each other and therefore the simultaneous optimization enables more reliable results. However, efficient concurrent optimization is a great challenge due to its huge search space. We propose an efficient linear method that solves the problem. In this method, the optimal body part matching conforms to local appearances and a human body plan, and the body part configuration is consistent with the object foreground estimated by simultaneous superpixel labeling. Our experiments on a variety of videos show that the proposed method is efficient and more reliable than previous locally constrained approaches.

[1]  Stan Sclaroff,et al.  Fast globally optimal 2D human detection with loopy graph models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Greg Mori,et al.  Guiding model search using segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[3]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[4]  Jitendra Malik,et al.  Estimating Human Body Configurations Using Shape Context Matching , 2002, ECCV.

[5]  Cheng Chen,et al.  Hybrid body representation for integrated pose recognition, localization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Mark Everingham,et al.  Combining discriminative appearance and segmentation cues for articulated human pose estimation , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[7]  Pushmeet Kohli,et al.  Simultaneous Segmentation and Pose Estimation of Humans Using Dynamic Graph Cuts , 2008, International Journal of Computer Vision.

[8]  Mun Wai Lee,et al.  Human Upper Body Pose Estimation in Static Images , 2004, ECCV.

[9]  Rómer Rosales,et al.  Inferring body pose without tracking body parts , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[10]  Larry S. Davis,et al.  Constraint Integration for Efficient Multiview Pose Estimation with Self-Occlusions , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  A. Yezzi,et al.  A variational framework for joint segmentation and registration , 2001, Proceedings IEEE Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA 2001).

[12]  Hao Jiang,et al.  Human Pose Estimation Using Consistent Max Covering , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Yang Wang,et al.  Multiple Tree Models for Occlusion and Spatial Constraints in Human Pose Estimation , 2008, ECCV.

[14]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[15]  David A. Forsyth,et al.  Probabilistic Methods for Finding People , 2001, International Journal of Computer Vision.

[16]  Jitendra Malik,et al.  Recovering human body configurations using pairwise constraints between parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  David J. Fleet,et al.  Temporal motion models for monocular and multiview 3D human body tracking , 2006, Comput. Vis. Image Underst..

[18]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, CVPR.

[19]  David A. Forsyth,et al.  Strike a pose: tracking people by finding stylized poses , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Jiří Matas,et al.  Computer Vision - ECCV 2004 , 2004, Lecture Notes in Computer Science.

[21]  Jitendra Malik,et al.  Twist Based Acquisition and Tracking of Animal and Human Kinematics , 2004, International Journal of Computer Vision.

[22]  Mads Nielsen,et al.  Computer Vision — ECCV 2002 , 2002, Lecture Notes in Computer Science.

[23]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[24]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[25]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26]  Cristian Sminchisescu,et al.  Estimating Articulated Human Motion with Covariance Scaled Sampling , 2003, Int. J. Robotics Res..

[27]  Andrew Blake,et al.  Probabilistic Tracking with Exemplars in a Metric Space , 2002, International Journal of Computer Vision.

[28]  Dariu Gavrila,et al.  A Bayesian, Exemplar-Based Approach to Hierarchical Shape Matching , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Andrew Zisserman,et al.  Pose search: Retrieving people using their pose , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Hao Jiang,et al.  Global pose estimation using non-tree models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Michael J. Black,et al.  Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).