论文信息 - MODEC: Multimodal Decomposable Models for Human Pose Estimation

MODEC: Multimodal Decomposable Models for Human Pose Estimation

We propose a multimodal, decomposable model for articulated human pose estimation in monocular images. A typical approach to this problem is to use a linear structured model, which struggles to capture the wide range of appearance present in realistic, unconstrained images. In this paper, we instead propose a model of human pose that explicitly captures a variety of pose modes. Unlike other multimodal models, our approach includes both global and local pose cues and uses a convex objective and joint training for mode selection and pose estimation. We also employ a cascaded mode selection step which controls the trade-off between speed and accuracy, yielding a 5x speedup in inference and learning. Our model outperforms state-of-the-art approaches across the accuracy-speed trade-off curve for several pose datasets. This includes our newly-collected dataset of people in movies, FLIC, which contains an order of magnitude more labeled data for training and testing than existing datasets.

Ben Taskar | Benjamin Sapp | B. Taskar | Benjamin Sapp

[1] Daniel P. Huttenlocher,et al. Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[2] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[3] Jianguo Zhang,et al. The PASCAL Visual Object Classes Challenge , 2006 .

[4] Cristian Sminchisescu,et al. Training Deformable Models for Localization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5] Luc Van Gool,et al. The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[6] Jitendra Malik,et al. SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7] Jitendra Malik,et al. Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[9] Yang Wang,et al. Multiple Tree Models for Occlusion and Spatial Constraints in Human Pose Estimation , 2008, ECCV.

[10] Vittorio Ferrari,et al. Better Appearance Models for Pictorial Structures , 2009, BMVC.

[11] Jitendra Malik,et al. Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12] Bernt Schiele,et al. Pictorial structures revisited: People detection and articulated pose estimation , 2009, CVPR.

[13] Ben Taskar,et al. Structured Prediction Cascades , 2010, AISTATS.

[14] Ben Taskar,et al. Cascaded Models for Articulated Pose Estimation , 2010, ECCV.

[15] David A. Forsyth,et al. Improved Human Parsing with a Full Relational Model , 2010, ECCV.

[16] Yang Wang,et al. Learning hierarchical poselets for human parsing , 2011, CVPR 2011.

[17] Mark Everingham,et al. Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[18] Ben Taskar,et al. Parsing human motion with stretchable models , 2011, CVPR 2011.

[19] Philip H. S. Torr,et al. Locally Linear Support Vector Machines , 2011, ICML.

[20] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[21] Alexei A. Efros,et al. Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[22] Silvio Savarese,et al. Articulated part-based model for joint object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[23] Kun Duan,et al. A Multi-layer Composite Model for Human Pose Estimation , 2012, BMVC.

[24] Yuandong Tian,et al. Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation , 2012, ECCV.

[25] Deva Ramanan,et al. Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.