Parsing Occluded People

Occlusion poses a significant difficulty for object recognition due to the combinatorial diversity of possible occlusion patterns. We take a strongly supervised, non-parametric approach to modeling occlusion by learning deformable models with many local part mixture templates using large quantities of synthetically generated training data. This allows the model to learn the appearance of different occlusion patterns including figure-ground cues such as the shapes of occluding contours as well as the co-occurrence statistics of occlusion between neighboring parts. The underlying part mixture-structure also allows the model to capture coherence of object support masks between neighboring parts and make compelling predictions of figure-ground-occluder segmentations. We test the resulting model on human pose estimation under heavy occlusion and find it produces improved localization accuracy.

[1]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[2]  王晓刚 Single-Pedestrian Detection aided by Multi-pedestrian Detection , 2013 .

[3]  Ali Farhadi,et al.  Recognition using visual phrases , 2011, CVPR 2011.

[4]  Martial Hebert,et al.  Occlusion reasoning for object detection under arbitrary viewpoint , 2012, CVPR.

[5]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[6]  Yi Yang,et al.  Layered Object Models for Image Segmentation , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Charless C. Fowlkes,et al.  Do We Need More Training Data or Better Models for Object Detection? , 2012, BMVC.

[8]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[10]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Vittorio Ferrari,et al.  We Are Family: Joint Pose Estimation of Multiple Persons , 2010, ECCV.

[12]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Alexei A. Efros,et al.  How Important Are "Deformable Parts" in the Deformable Parts Model? , 2012, ECCV Workshops.

[14]  David A. McAllester,et al.  Object Detection with Grammar Models , 2011, NIPS.

[15]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Subhransu Maji,et al.  Object segmentation by alignment of poselet activations to image contours , 2011, CVPR 2011.

[17]  Daphne Koller,et al.  A segmentation-aware object detection model with occlusion handling , 2011, CVPR 2011.

[18]  Ivan Laptev,et al.  Object Detection Using Strongly-Supervised Deformable Part Models , 2012, ECCV.

[19]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Yi Yang,et al.  Recognizing proxemics in personal photos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Bernt Schiele,et al.  Detection and Tracking of Occluded People , 2014, International Journal of Computer Vision.

[24]  Bernt Schiele,et al.  Articulated people detection and pose estimation: Reshaping the future , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Deva Ramanan,et al.  Detecting Actions, Poses, and Objects with Relational Phraselets , 2012, ECCV.

[26]  Andrew Zisserman,et al.  Structured output regression for detection with partial truncation , 2009, NIPS.

[27]  Pietro Perona,et al.  Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).