论文信息 - Fisher Vectors for Fine-Grained Visual Categorization

Fisher Vectors for Fine-Grained Visual Categorization

The bag-of-visual-words (BOV) is certainly the most popular image representation to date and it has been shown to yield good results in various problems including Fine-Grained Visual Categorization (FGVC) [3, 4]. Our contribution is to show that the Fisher Vector (FV) - which describes an image by its deviation from an "average" model - is an excellent alternative to the BOV for the FGVC problem. In this extended abstract we first provide a brief introduction to the FV. We then present theoretical as well as practical motivations for using the FV for FGVC. We finally provide experimental results on four ImageNet subsets: fungus, ungulate, vehicle and ImageNet10K. Compared to [4] which uses spatial pyramid (SP) BOV representations, we report significantly higher classification accuracies. For instance, on ImageNet10K we report 16.7% vs 6.4% top-1 accuracy (a 160% relative improvement).

[1] Fei-Fei Li,et al. What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[2] Eli Shechtman,et al. In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Florent Perronnin,et al. High-dimensional signature compression for large-scale image classification , 2011, CVPR 2011.

[4] Florent Perronnin,et al. Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Pietro Perona,et al. Visual Recognition with Humans in the Loop , 2010, ECCV.

[6] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8] Thomas Mensink,et al. Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[9] David G. Lowe,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.