End-to-end Learning of a Fisher Vector Encoding for Part Features in Fine-grained Recognition

Part-based approaches for fine-grained recognition do not show the expected performance gain over global methods, although being able to explicitly focus on small details that are relevant for distinguishing highly similar classes. We assume that part-based methods suffer from a missing representation of local features, which is invariant to the order of parts and can handle a varying number of visible parts appropriately. The order of parts is artificial and often only given by ground-truth annotations, whereas viewpoint variations and occlusions result in parts that are not observable. Therefore, we propose integrating a Fisher vector encoding of part features into convolutional neural networks. The parameters for this encoding are estimated jointly with those of the neural network in an end-to-end manner. Our approach improves state-of-the-art accuracies for bird species classification on CUB-200-2011 from 90.40\% to 90.95\%, on NA-Birds from 89.20\% to 90.30\%, and on Birdsnap from 84.30\% to 86.97\%.

[1]  Takuya Akiba,et al.  ChainerMN: Scalable Distributed Deep Learning Framework , 2017, ArXiv.

[2]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[3]  Subhransu Maji,et al.  Deep filter banks for texture recognition and segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yang Song,et al.  Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[6]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[7]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[8]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  John Q. Gan,et al.  Sequential EM for Unsupervised Adaptive Gaussian Mixture Model Based Classifier , 2009, MLDM.

[10]  Christoph H. Lampert,et al.  Deep Fisher Kernels -- End to End Learning of the Fisher Kernel GMM Parameters , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Yee Whye Teh,et al.  Stochastic Expectation Maximization with Variance Reduction , 2018, NeurIPS.

[12]  Tao Hu,et al.  See Better Before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification , 2019, ArXiv.

[13]  Ryan Farrell,et al.  Aligned to the Object, Not to the Image: A Unified Pose-Aligned Representation for Fine-Grained Recognition , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[14]  Takuya Akiba,et al.  Chainer: A Deep Learning Framework for Accelerating the Research Cycle , 2019, KDD.

[15]  Andrew Zisserman,et al.  Deep Fisher Networks for Large-Scale Image Classification , 2013, NIPS.

[16]  Hendrik P. A. Lensch,et al.  Backpropagation Training for Fisher Vectors within Neural Networks , 2017, ArXiv.

[17]  Joachim Denzler,et al.  The Whole Is More Than Its Parts? From Explicit to Implicit Pose Normalization , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[19]  Joachim Denzler,et al.  Nonparametric Part Transfer for Fine-Grained Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Yuxin Peng,et al.  Which and How Many Regions to Gaze: Focus Discriminative Regions for Fine-Grained Visual Categorization , 2019, International Journal of Computer Vision.

[21]  Yizhou Yu,et al.  Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification From the Bottom Up , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Joachim Denzler,et al.  Classification-Specific Parts for Improving Fine-Grained Visual Categorization , 2019, GCPR.

[25]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[26]  Qing Li,et al.  Locally-Transferred Fisher Vectors for Texture Classification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Yang Song,et al.  The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Matthijs Douze,et al.  Fixing the train-test resolution discrepancy , 2019, NeurIPS.

[29]  Tao Mei,et al.  Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Dacheng Tao,et al.  Learning a Mixture of Granularity-Specific Experts for Fine-Grained Categorization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Kenta Oono,et al.  Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[35]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[36]  Seung Woo Lee,et al.  Birdsnap: Large-Scale Fine-Grained Visual Categorization of Birds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Zhuowen Tu,et al.  Deep FisherNet for Image Classification , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Cheng Yang,et al.  Re-rank Coarse Classification with Local Region Enhanced Features for Fine-Grained Image Recognition , 2021, ArXiv.

[39]  Yali Wang,et al.  Learning Attentive Pairwise Interaction for Fine-Grained Classification , 2020, AAAI.

[40]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[42]  Pietro Perona,et al.  Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Eric Moulines,et al.  On the Global Convergence of (Fast) Incremental Expectation Maximization Methods , 2019, NeurIPS.

[44]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Tao Mei,et al.  Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[47]  Qi Zou,et al.  Unsupervised Part Mining for Fine-grained Image Classification , 2019, ArXiv.

[48]  Vassilis Athitsos,et al.  Domain Adaptive Transfer Learning on Visual Attention Aware Data Augmentation for Fine-grained Visual Categorization , 2020, ISVC.

[49]  Joachim Denzler,et al.  Generalized Orderless Pooling Performs Implicit Salient Matching , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[50]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  O. Cappé,et al.  On‐line expectation–maximization algorithm for latent data models , 2009 .

[52]  John Q. Gan,et al.  Unsupervised adaptive GMM for BCI , 2009, 2009 4th International IEEE/EMBS Conference on Neural Engineering.

[53]  Dong Wang,et al.  Learning to Navigate for Fine-grained Classification , 2018, ECCV.

[54]  Dan Klein,et al.  Online EM for Unsupervised Models , 2009, NAACL.

[55]  Jonathan Krause,et al.  The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition , 2015, ECCV.

[56]  Jiebo Luo,et al.  Learning Deep Bilinear Transformation for Fine-grained Image Representation , 2019, NeurIPS.

[57]  Qi Tian,et al.  Picking Deep Filter Responses for Fine-Grained Image Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).