Efficient Multi-Person Pose Estimation with Provable Guarantees

Multi-person pose estimation (MPPE) in natural images is key to the meaningful use of visual data in many fields including movement science, security, and rehabilitation. In this paper we tackle MPPE with a bottom-up approach, starting with candidate detections of body parts from a convolutional neural network (CNN) and grouping them into people. We formulate the grouping of body part detections into people as a minimum-weight set packing (MWSP) problem where the set of potential people is the power set of body part detections. We model the quality of a hypothesis of a person which is a set in the MWSP by an augmented tree-structured Markov random field where variables correspond to body-parts and their state-spaces correspond to the power set of the detections for that part. We describe a novel algorithm that combines efficiency with provable bounds on this MWSP problem. We employ an implicit column generation strategy where the pricing problem is formulated as a dynamic program. To efficiently solve this dynamic program we exploit the problem structure utilizing a nested Bender's decomposition (NBD) exact inference strategy which we speed up by recycling Bender's rows between calls to the pricing problem. We test our approach on the MPII-Multiperson dataset, showing that our approach obtains comparable results with the state-of-the-art algorithm for joint node labeling and grouping problems, and that NBD achieves considerable speed-ups relative to a naive dynamic programming approach. Typical algorithms that solve joint node labeling and grouping problems use heuristics and thus can not obtain proofs of optimality. Our approach, in contrast, proves that for over 99 percent of problem instances we find the globally optimal solution and otherwise provide upper/lower bounds.

[1]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Julian Yarkony,et al.  Exploiting skeletal structure in computer vision annotation with Benders decomposition , 2017, ArXiv.

[3]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Bernt Schiele,et al.  Subgraph decomposition for multi-target tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Julian Yarkony,et al.  Tracking Objects with Higher Order Interactions via Delayed Column Generation , 2015, AISTATS.

[6]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[7]  Thomas Brox,et al.  Joint Graph Decomposition & Node Labeling: Problem, Algorithms, Applications , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Julian Yarkony,et al.  Multi-Person Pose Estimation via Column Generation , 2017, ArXiv.

[9]  Shengen Yan,et al.  Deep Image: Scaling up Image Recognition , 2015, ArXiv.

[10]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[11]  A. M. Geoffrion,et al.  Multicommodity Distribution System Design by Benders Decomposition , 1974 .

[12]  Bernt Schiele,et al.  DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[13]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[14]  John R. Birge,et al.  Decomposition and Partitioning Methods for Multistage Stochastic Linear Programs , 1985, Oper. Res..

[15]  J. Desrosiers,et al.  A Primer in Column Generation , 2005 .

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Julian Yarkony,et al.  Planar Ultrametrics for Image Segmentation , 2015, NIPS.

[18]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Martin W. P. Savelsbergh,et al.  Branch-and-Price: Column Generation for Solving Huge Integer Programs , 1998, Oper. Res..

[20]  R. Gomory,et al.  Multistage Cutting Stock Problems of Two and More Dimensions , 1965 .

[21]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[22]  Jacques Desrosiers,et al.  Dual-Optimal Inequalities for Stabilized Column Generation , 2003, Oper. Res..

[23]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[24]  J. F. Benders Partitioning procedures for solving mixed-variables programming problems , 1962 .

[25]  R. Gomory,et al.  A Linear Programming Approach to the Cutting-Stock Problem , 1961 .

[26]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).