Part-Pair Representation for Part Localization

In this paper, we propose a novel part-pair representation for part localization. In this representation, an object is treated as a collection of part pairs to model its shape and appearance. By changing the set of pairs to be used, we are able to impose either stronger or weaker geometric constraints on the part configuration. As for the appearance, we build pair detectors for each part pair, which model the appearance of an object at different levels of granularities. Our method of part localization exploits the part-pair representation, featuring the combination of non-parametric exemplars and parametric regression models. Non-parametric exemplars help generate reliable part hypotheses from very noisy pair detections. Then, the regression models are used to group the part hypotheses in a flexible way to predict the part locations. We evaluate our method extensively on the dataset CUB-200-2011 [32], where we achieve significant improvement over the state-of-the-art method on bird part localization. We also experiment with human pose estimation, where our method produces comparable results to existing works.

[1]  Alexei A. Efros,et al.  Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[2]  Fred Nicolls,et al.  Locating Facial Features with an Extended Active Shape Model , 2008, ECCV.

[3]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[4]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Peter V. Gehler,et al.  Strong Appearance and Expressive Spatial Models for Human Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[7]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[9]  Feng Zhou,et al.  Exemplar-Based Graph Matching for Robust Facial Landmark Localization , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[11]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[12]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[13]  Xiaogang Wang,et al.  Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Pietro Perona,et al.  The Fastest Pedestrian Detector in the West , 2010, BMVC.

[15]  Silvio Savarese,et al.  Articulated part-based model for joint object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[16]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[18]  Sanja Fidler,et al.  Detect What You Can: Detecting and Representing Objects Using Holistic Models and Body Parts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Ivan Laptev,et al.  Object Detection Using Strongly-Supervised Deformable Part Models , 2012, ECCV.

[21]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[22]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[23]  Andrew Zisserman,et al.  Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.

[24]  Peter N. Belhumeur,et al.  Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Yang Wang,et al.  Learning hierarchical poselets for human parsing , 2011, CVPR 2011.

[26]  Peter N. Belhumeur,et al.  POOF: Part-Based One-vs.-One Features for Fine-Grained Categorization, Face Verification, and Attribute Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  David J. Kriegman,et al.  Localizing parts of faces using a consensus of exemplars , 2011, CVPR.

[28]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[29]  Thomas Vetter,et al.  Optimal landmark detection using shape models and branch and bound , 2011, 2011 International Conference on Computer Vision.

[30]  Piotr Dollár,et al.  Crosstalk Cascades for Frame-Rate Pedestrian Detection , 2012, ECCV.

[31]  Julee Cobb,et al.  Hello, My Name Is… , 2016 .

[32]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[33]  Peter V. Gehler,et al.  Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[35]  Steve Branson,et al.  Efficient Large-Scale Structured Learning , 2013, CVPR.

[36]  Thomas Deselaers,et al.  ClassCut for Unsupervised Class Segmentation , 2010, ECCV.

[37]  Subhransu Maji,et al.  Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[38]  Timothy F. Cootes,et al.  Feature Detection and Tracking with Constrained Local Models , 2006, BMVC.

[39]  Deva Ramanan,et al.  Histograms of Sparse Codes for Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Matthieu Guillaumin,et al.  Segmentation Propagation in ImageNet , 2012, ECCV.