Coarse-to-Fine Description for Fine-Grained Visual Categorization

Recent years have witnessed the significant advance in fine-grained visual categorization, which targets to classify the objects belonging to the same species. To capture enough subtle visual differences and build discriminative visual description, most of the existing methods heavily rely on the artificial part annotations, which are expensive to collect in real applications. Motivated to conquer this issue, this paper proposes a multi-level coarse-to-fine object description. This novel description only requires the original image as input, but could automatically generate visual descriptions discriminative enough for fine-grained visual categorization. This description is extracted from five sources representing coarse-to-fine visual clues: 1) original image is used as the source of global visual clue; 2) object bounding boxes are generated using convolutional neural network (CNN); 3) with the generated bounding box, foreground is segmented using the proposed k nearest neighbour-based co-segmentation algorithm; and 4) two types of part segmentations are generated by dividing the foreground with an unsupervised part learning strategy. The final description is generated by feeding these sources into CNN models and concatenating their outputs. Experiments on two public benchmark data sets show the impressive performance of this coarse-to-fine description, i.e., classification accuracy achieves 82.5% on CUB-200-2011, and 86.9% on fine-grained visual categorization-Aircraft, respectively, which outperform many recent works.

[1]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[2]  J. IIVARINENHelsinki Efficiency of Simple Shape Descriptors , 1997 .

[3]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[4]  Charles A. Collin,et al.  Subordinate-level categorization relies on high spatial frequencies to a greater degree than basic-level categorization , 2005, Perception & psychophysics.

[5]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[6]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[7]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[9]  Jitendra Malik,et al.  From contours to regions: An empirical evaluation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[12]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[13]  Wenyu Liu,et al.  Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning , 2011, NIPS.

[14]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[17]  Linda G. Shapiro,et al.  Unsupervised Template Learning for Fine-Grained Object Recognition , 2012, NIPS.

[18]  Arnold W. M. Smeulders,et al.  Fine-Grained Categorization by Alignments , 2013, 2013 IEEE International Conference on Computer Vision.

[19]  Andrew Zisserman,et al.  Symbiotic Segmentation and Part Localization for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Peter N. Belhumeur,et al.  Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Forrest N. Iandola,et al.  Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Shenghuo Zhu,et al.  Efficient Object Detection and Segmentation for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Peter N. Belhumeur,et al.  POOF: Part-Based One-vs.-One Features for Fine-Grained Categorization, Face Verification, and Attribute Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[25]  Qi Tian,et al.  Hierarchical Part Matching for Fine-Grained Visual Categorization , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[27]  Cordelia Schmid,et al.  Segmentation Driven Object Detection with Fisher Vectors , 2013, 2013 IEEE International Conference on Computer Vision.

[28]  Arnold W. M. Smeulders,et al.  Local Alignments for Fine-Grained Categorization , 2014, International Journal of Computer Vision.

[29]  Peter N. Belhumeur,et al.  Part-Pair Representation for Part Localization , 2014, ECCV.

[30]  Pietro Perona,et al.  Bird Species Categorization Using Pose Normalized Deep Convolutional Nets , 2014, ArXiv.

[31]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[33]  Liang Lin,et al.  Deep Joint Task Learning for Generic Object Extraction , 2014, NIPS.

[34]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[35]  Naila Murray,et al.  Revisiting the Fisher vector for fine-grained classification , 2014, Pattern Recognit. Lett..

[36]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[38]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[39]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[40]  Joachim Denzler,et al.  Nonparametric Part Transfer for Fine-Grained Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Lei Zhang,et al.  Bit-Scalable Deep Hashing With Regularized Similarity Learning for Image Retrieval and Person Re-Identification , 2015, IEEE Transactions on Image Processing.

[42]  Yongdong Zhang,et al.  Orientational Spatial Part Modeling for Fine-Grained Visual Categorization , 2015, 2015 IEEE International Conference on Mobile Services.

[43]  Cewu Lu,et al.  Deep LAC: Deep localization, alignment and classification for fine-grained recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jonathan Krause,et al.  Fine-grained recognition without part annotations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Jian Sun,et al.  Convolutional feature masking for joint object and stuff segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[47]  Jian Dong,et al.  Deep Human Parsing with Active Template Regression , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Peter I. Corke,et al.  Subset feature learning for fine-grained category classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[49]  Yuxin Peng,et al.  The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[51]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Marcel Simon,et al.  Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[53]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[54]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[55]  Xuelong Li,et al.  DISC: Deep Image Saliency Computing via Progressive Representation Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[56]  Qi Tian,et al.  Fused One-vs-All Features With Semantic Alignments for Fine-Grained Visual Categorization , 2016, IEEE Transactions on Image Processing.

[57]  Xiang Bai,et al.  Script identification in the wild via discriminative convolutional neural network , 2016, Pattern Recognit..