Learning Features and Parts for Fine-Grained Recognition

This paper addresses the problem of fine-grained recognition: recognizing subordinate categories such as bird species, car models, or dog breeds. We focus on two major challenges: learning expressive appearance descriptors and localizing discriminative parts. To this end, we propose an object representation that detects important parts and describes fine grained appearances. The part detectors are learned in a fully unsupervised manner, based on the insight that images with similar poses can be automatically discovered for fine-grained classes in the same domain. The appearance descriptors are learned using a convolutional neural network. Our approach requires only image level class labels, without any use of part annotations or segmentation masks, which may be costly to obtain. We show experimentally that combining these two insights is an effective strategy for fine-grained recognition.

[1]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[4]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[5]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Motorcycles Faces Guitars Subordinate class recognition using relational object models , 2006 .

[8]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Andrew Zisserman,et al.  Delving into the Whorl of Flower Segmentation , 2007, BMVC.

[10]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[15]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[16]  Larry S. Davis,et al.  Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance , 2011, 2011 International Conference on Computer Vision.

[17]  Fei-Fei Li,et al.  Combining randomization and discrimination for fine-grained image categorization , 2011, CVPR 2011.

[18]  Kristen Grauman,et al.  Interactively building a discriminative vocabulary of nameable attributes , 2011, CVPR 2011.

[19]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[20]  Pietro Perona,et al.  Multiclass recognition and part localization with humans in the loop , 2011, 2011 International Conference on Computer Vision.

[21]  Kun Duan,et al.  Discovering localized attributes for fine-grained recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Honglak Lee,et al.  Learning to Align from Scratch , 2012, NIPS.

[23]  Luc Van Gool,et al.  TriCoS: A Tri-level Class-Discriminative Co-segmentation Method for Image Classification , 2012, ECCV.

[24]  C. V. Jawahar,et al.  Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Trevor Darrell,et al.  Pose pooling kernels for sub-category recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Honglak Lee,et al.  Learning hierarchical representations for face verification with convolutional deep belief networks , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Gary R. Bradski,et al.  A codebook-free and annotation-free approach for fine-grained image categorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[30]  Linda G. Shapiro,et al.  Unsupervised Template Learning for Fine-Grained Object Recognition , 2012, NIPS.

[31]  David W. Jacobs,et al.  Dog Breed Classification Using Part Localization , 2012, ECCV.

[32]  James J. Little,et al.  Fine-Grained Categorization for 3D Scene Understanding , 2012, BMVC.

[33]  Jia Deng,et al.  Large scale visual recognition , 2012 .

[34]  Arnold W. M. Smeulders,et al.  Fine-Grained Categorization by Alignments , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  Andrew Zisserman,et al.  Symbiotic Segmentation and Part Localization for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision.

[36]  Shenghuo Zhu,et al.  Efficient Object Detection and Segmentation for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Peter N. Belhumeur,et al.  POOF: Part-Based One-vs.-One Features for Fine-Grained Categorization, Face Verification, and Attribute Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[39]  Jonathan Krause,et al.  Fine-Grained Crowdsourcing for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[41]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.