Improving Fine-Grained Visual Classification using Pairwise Confusion

Fine-Grained Visual Classification (FGVC) datasets contain small sample sizes, along with significant intra-class variation and inter-class similarity. While prior work has addressed intra-class variation using localization and segmentation techniques, the inter-class similarity may also affect feature learning and reduce classification performance. In this work, we address this problem using a novel optimization procedure for the end-to-end neural network training on FGVC tasks. This procedure, called Pairwise Confusion (PC) attempts to learn features with greater generalization, thereby preventing overfitting. This regularization during training is accomplished by intentionally introducing confusion in the activations. With PC regularization, we obtain state-of-the-art performance on six of the most widely-used FGVC datasets and demonstrate improved localization ability. PC is easy to implement, does not need excessive hyperparameter tuning during training, and does not add significant overhead during test time

[1]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jonathan Krause,et al.  The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition , 2015, ECCV.

[4]  Ross B. Girshick,et al.  Reducing Overfitting in Deep Networks by Decorrelating Representations , 2015, ICLR.

[5]  Gary R. Bradski,et al.  A codebook-free and annotation-free approach for fine-grained image categorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Jianfei Cai,et al.  Weakly Supervised Fine-Grained Categorization With Part-Based Image Representation , 2016, IEEE Transactions on Image Processing.

[7]  Ehsan Adeli,et al.  Deep Relative Attributes , 2015, ACCV.

[8]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[9]  Qi Tian,et al.  Picking Deep Filter Responses for Fine-Grained Image Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[12]  Hefei Ling,et al.  Hierarchical Joint CNN-Based Models for Fine-Grained Cars Recognition , 2016, ICCCS.

[13]  Joachim Denzler,et al.  Generalized Orderless Pooling Performs Implicit Salient Matching , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Abhimanyu Dubey,et al.  Modeling Image Virality with Pairwise Spatial Transformer Networks , 2017, ACM Multimedia.

[15]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[16]  Shenghuo Zhu,et al.  Efficient Object Detection and Segmentation for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Ramesh Raskar,et al.  Deep Learning the City: Quantifying Urban Perception at a Global Scale , 2016, ECCV.

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Yong Jae Lee,et al.  End-to-End Localization and Ranking for Relative Attributes , 2016, ECCV.

[20]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[23]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[25]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Pietro Perona,et al.  Bird Species Categorization Using Pose Normalized Deep Convolutional Nets , 2014, ArXiv.

[28]  Trevor Darrell,et al.  Pose pooling kernels for sub-category recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[30]  Jian Yang,et al.  Boosted Convolutional Neural Networks , 2016, BMVC.

[31]  Quoc V. Le,et al.  Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.

[32]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[33]  Subhransu Maji,et al.  Improved Bilinear Pooling with CNNs , 2017, BMVC.

[34]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Xiao Liu,et al.  Kernel Pooling for Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[37]  Jonghyun Choi,et al.  Mining Discriminative Triplets of Patches for Fine-Grained Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  L. M. M.-T. Theory of Probability , 1929, Nature.

[39]  Yang Gao,et al.  Fine-grained pose prediction, normalization, and recognition , 2015, ArXiv.

[40]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[41]  Abhishek Das,et al.  Grad-CAM: Why did you say that? , 2016, ArXiv.

[42]  Feng Zhou,et al.  Fine-Grained Categorization and Dataset Bootstrapping Using Deep Metric Learning with Humans in the Loop , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Fei-Fei Li,et al.  Combining randomization and discrimination for fine-grained image categorization , 2011, CVPR 2011.

[45]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[46]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[48]  Maria L. Rizzo,et al.  Energy statistics: A class of statistics based on distances , 2013 .

[49]  Jonathan Krause,et al.  Fine-grained recognition without part annotations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[51]  Pietro Perona,et al.  Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[53]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[54]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[55]  Shu Kong,et al.  Low-Rank Bilinear Pooling for Fine-Grained Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).