论文信息 - Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency

Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency

In this paper, we propose a novel approach for bird part localization, targeting fine-grained categories with wide variations in appearance due to different poses (including aspect and orientation) and subcategories. As it is challenging to represent such variations across a large set of diverse samples with tractable parametric models, we turn to individual exemplars. Specifically, we extend the exemplar-based models in [4] by enforcing pose and subcategory consistency at the parts. During training, we build pose-specific detectors scoring part poses across subcategories, and subcategory-specific detectors scoring part appearance across poses. At the testing stage, likely exemplars are matched to the image, suggesting part locations whose pose and subcategory consistency are well-supported by the image cues. From these hypotheses, part configuration can be predicted with very high accuracy. Experimental results demonstrate significant performance gains from our method on an extensive dataset: CUB-200-2011 [30], for both localization and classification tasks.

Peter N. Belhumeur | Jiongxin Liu | P. Belhumeur | Jiongxin Liu

[1] Ivan Laptev,et al. Object Detection Using Strongly-Supervised Deformable Part Models , 2012, ECCV.

[2] Jian Sun,et al. Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[3] Jitendra Malik,et al. Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[5] Silvio Savarese,et al. Articulated part-based model for joint object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[6] Andrea Vedaldi,et al. Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[7] Katja Markert,et al. Learning Models for Object Recognition from Natural Language Descriptions , 2009, BMVC.

[8] Timothy F. Cootes,et al. Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9] Alexei A. Efros,et al. How Important Are "Deformable Parts" in the Deformable Parts Model? , 2012, ECCV Workshops.

[10] C. V. Jawahar,et al. Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Alexei A. Efros,et al. Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[12] David J. Kriegman,et al. Localizing parts of faces using a consensus of exemplars , 2011, CVPR.

[13] Thomas Vetter,et al. Optimal landmark detection using shape models and branch and bound , 2011, 2011 International Conference on Computer Vision.

[14] Gary R. Bradski,et al. A codebook-free and annotation-free approach for fine-grained image categorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Pietro Perona,et al. Strong supervision from weak annotation: Interactive training of deformable part models , 2011, 2011 International Conference on Computer Vision.

[16] Pietro Perona,et al. Multiclass recognition and part localization with humans in the loop , 2011, 2011 International Conference on Computer Vision.

[17] Andrew Zisserman,et al. Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.

[18] Jitendra Malik,et al. Semantic segmentation using regions and parts , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Erica Klarreich,et al. Hello, my name is… , 2014, CACM.

[20] Larry S. Davis,et al. Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance , 2011, 2011 International Conference on Computer Vision.

[21] Subhransu Maji,et al. Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[22] Simon Baker,et al. Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[23] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[24] Simon Lucey,et al. Face alignment through subspace constrained mean-shifts , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25] Fred Nicolls,et al. Locating Facial Features with an Extended Active Shape Model , 2008, ECCV.

[26] Norimichi Ukita. Articulated pose estimation with parts connectivity using discriminative local oriented contours , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27] Deva Ramanan,et al. Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28] David W. Jacobs,et al. Dog Breed Classification Using Part Localization , 2012, ECCV.

[29] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30] William T. Freeman,et al. Latent hierarchical structural learning for object detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Peter N. Belhumeur,et al. POOF: Part-Based One-vs.-One Features for Fine-Grained Categorization, Face Verification, and Attribute Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[33] Luc Van Gool,et al. Real-time facial feature detection using conditional regression forests , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Daniel P. Huttenlocher,et al. Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[35] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[36] Trevor Darrell,et al. Pose pooling kernels for sub-category recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.