Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance

Subordinate-level categorization typically rests on establishing salient distinctions between part-level characteristics of objects, in contrast to basic-level categorization, where the presence or absence of parts is determinative. We develop an approach for subordinate categorization in vision, focusing on an avian domain due to the fine-grained structure of the category taxonomy for this domain. We explore a pose-normalized appearance model based on a volumetric poselet scheme. The variation in shape and appearance properties of these parts across a taxonomy provides the cues needed for subordinate categorization. Training pose detectors requires a relatively large amount of training data per category when done from scratch; using a subordinate-level approach, we exploit a pose classifier trained at the basic-level, and extract part appearance and shape information to build subordinate-level models. Our model associates the underlying image pattern parameters used for detection with corresponding volumetric part location, scale and orientation parameters. These parameters implicitly define a mapping from the image pixels into a pose-normalized appearance space, removing view and pose dependencies, facilitating fine-grained categorization from relatively few training examples.

[1]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[2]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[3]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[4]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[5]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[6]  M. Burl,et al.  Face Localization via Shape Statistics , 1995 .

[7]  M. Tarr Rotating objects to recognize them: A case study on the role of viewpoint dependency in the recognition of three-dimensional objects , 1995, Psychonomic bulletin & review.

[8]  Isabel Gauthier,et al.  Three-dimensional object recognition is viewpoint dependent , 1998, Nature Neuroscience.

[9]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[10]  I. Biederman,et al.  Subordinate-level object classification reexamined , 1999, Psychological research.

[11]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[12]  Paul A. Viola,et al.  Learning from one example through shared densities on transforms , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[13]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Michael Jones,et al.  Multidimensional Morphable Models: A Framework for Representing and Matching Object Classes , 2004, International Journal of Computer Vision.

[16]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[17]  Pietro Perona,et al.  A sparse object category model for efficient learning and exhaustive recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Erik G. Learned-Miller,et al.  Building a classification cascade for visual identification from one example , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Daniel P. Huttenlocher,et al.  Spatial priors for part-based recognition using statistical models , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  Daphna Weinshall,et al.  Subordinate class recognition using relational object models , 2006, NIPS.

[22]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Motorcycles Faces Guitars Subordinate class recognition using relational object models , 2006 .

[24]  Andrew Zisserman,et al.  A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  Daphna Weinshall,et al.  Exploiting Object Hierarchy: Combining Models from Different Category Levels , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[26]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[27]  Pietro Perona,et al.  Unsupervised learning of visual taxonomies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[29]  Pietro Perona,et al.  Learning and using taxonomies for fast visual categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Narendra Ahuja,et al.  Learning subcategory relevances for category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Cordelia Schmid,et al.  Constructing Category Hierarchies for Visual Recognition , 2008, ECCV.

[32]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[34]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[35]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[36]  Steve Branson,et al.  Similarity metrics for categorization: From monolithic to category specific , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[38]  Cordelia Schmid,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[39]  Thomas G. Dietterich,et al.  Dictionary-free categorization of very similar objects via stacked evidence trees , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Gang Wang,et al.  Joint learning of visual attributes, object classes and visual saliency , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[41]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[43]  Jia Deng,et al.  A large-scale hierarchical image database , 2009, CVPR 2009.

[44]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Ali Farhadi,et al.  Attribute-centric recognition for cross-category generalization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[46]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[47]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[48]  Alexander C. Berg,et al.  Automatic Attribute Discovery and Characterization from Noisy Web Data , 2010, ECCV.

[49]  Subhransu Maji,et al.  Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[50]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[51]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[52]  Antonio Torralba,et al.  Semantic Label Sharing for Learning with Many Categories , 2010, ECCV.

[53]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.