See Better Before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification

Data augmentation is usually adopted to increase the amount of training data, prevent overfitting and improve the performance of deep models. However, in practice, random data augmentation, such as random image cropping, is low-efficiency and might introduce many uncontrolled background noises. In this paper, we propose Weakly Supervised Data Augmentation Network (WS-DAN) to explore the potential of data augmentation. Specifically, for each training image, we first generate attention maps to represent the object's discriminative parts by weakly supervised learning. Next, we augment the image guided by these attention maps, including attention cropping and attention dropping. The proposed WS-DAN improves the classification accuracy in two folds. In the first stage, images can be seen better since more discriminative parts' features will be extracted. In the second stage, attention regions provide accurate location of object, which ensures our model to look at the object closer and further improve the performance. Comprehensive experiments in common fine-grained visual classification datasets show that our WS-DAN surpasses the state-of-the-art methods, which demonstrates its effectiveness.

[1]  Jonathan Krause,et al.  Fine-grained recognition without part annotations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Zhi-Hua Zhou,et al.  A brief introduction to weakly supervised learning , 2018 .

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Tao Mei,et al.  Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[6]  Yuxin Peng,et al.  The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Nojun Kwak,et al.  Analysis on the Dropout Effect in Convolutional Neural Networks , 2016, ACCV.

[8]  Nanning Zheng,et al.  Grassmann Pooling as Compact Homogeneous Bilinear Pooling for Fine-Grained Visual Classification , 2018, ECCV.

[9]  Dong Wang,et al.  Learning to Navigate for Fine-grained Classification , 2018, ECCV.

[10]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[11]  Naila Murray,et al.  Revisiting the Fisher vector for fine-grained classification , 2014, Pattern Recognit. Lett..

[12]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[13]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[14]  Qi Tian,et al.  Picking Deep Filter Responses for Fine-Grained Image Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ngoc Thang Vu,et al.  Densely Connected Convolutional Networks for Speech Recognition , 2018, ITG Symposium on Speech Communication.

[16]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[17]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Xiu-Shen Wei,et al.  Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization , 2018, Pattern Recognit..

[19]  Tao Mei,et al.  Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ramesh Raskar,et al.  Pairwise Confusion for Fine-Grained Visual Classification , 2017, ECCV.

[21]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Qingming Huang,et al.  Facial Landmarks Detection by Self-Iterative Regression based Landmarks-Attention Network , 2018, AAAI.

[23]  Subhransu Maji,et al.  Improved Bilinear Pooling with CNNs , 2017, BMVC.

[24]  Errui Ding,et al.  Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition , 2018, ECCV.

[25]  Larry S. Davis,et al.  Learning a Discriminative Filter Bank Within a CNN for Fine-Grained Recognition , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Qilong Wang,et al.  Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Yong Jae Lee,et al.  Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Jonghyun Choi,et al.  Mining Discriminative Triplets of Patches for Fine-Grained Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[32]  Marcel Simon,et al.  Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[34]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[36]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[37]  Fei Yang,et al.  Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Xiao Liu,et al.  Fully Convolutional Attention Localization Networks: Efficient Attention Localization for Fine-Grained Recognition , 2016, ArXiv.

[39]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[40]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[41]  Zhiqiang Shen,et al.  Multiple Granularity Descriptors for Fine-Grained Categorization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42]  Andrea Vedaldi,et al.  Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Yi Yang,et al.  Adversarial Complementary Learning for Weakly Supervised Object Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[46]  Cewu Lu,et al.  Deep LAC: Deep localization, alignment and classification for fine-grained recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).