论文信息 - Fine-Grained Recognition with Automatic and Efficient Part Attention

Fine-Grained Recognition with Automatic and Efficient Part Attention

Fine-grained recognition is challenging due to the subtle local inter-class differences versus the large intra-class variations such as poses. A key to address this problem is to localize discriminative parts to extract pose-invariant features. However, ground-truth part annotations can be expensive to acquire. Moreover, it is hard to define parts for many fine-grained classes. This work introduces Fully Convolutional Attention Networks (FCANs), a reinforcement learning framework to optimally glimpse local discriminative regions adaptive to different fine-grained domains. Compared to previous methods, our approach enjoys four advantages: 1) the three components including feature extraction, visual attention and fine-grained classification are unified in an end-to-end system; 2) the weaklysupervised reinforcement learning procedure requires no expensive part annotations; 3) the fully-convolutional architecture speeds up both training and testing; 4) the greedy reward strategy accelerates the convergence of the learning. We demonstrate the effectiveness of our method with extensive experiments on four challenging fine-grained benchmark datasets, including Stanford Dogs, Stanford Cars, CUB-200-2011 and Food-101.

[1] Jonathan Krause,et al. Fine-grained recognition without part annotations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[3] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[5] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.

[6] Arnold W. M. Smeulders,et al. Fine-Grained Categorization by Alignments , 2013, 2013 IEEE International Conference on Computer Vision.

[7] Sergio Guadarrama,et al. Im2Calories: Towards an Automated Mobile Vision Food Diary , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8] Jianfei Cai,et al. Weakly Supervised Fine-Grained Image Categorization , 2015, ArXiv.

[9] Koray Kavukcuoglu,et al. Visual Attention , 2020, Computational Models for Cognitive Vision.

[10] Andrew Zisserman,et al. Symbiotic Segmentation and Part Localization for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision.

[11] Andrew Zisserman,et al. BiCoS: A Bi-level co-segmentation method for image classification , 2011, 2011 International Conference on Computer Vision.

[12] Linda G. Shapiro,et al. Unsupervised Template Learning for Fine-Grained Object Recognition , 2012, NIPS.

[13] Pietro Perona,et al. Bird Species Categorization Using Pose Normalized Deep Convolutional Nets , 2014, ArXiv.

[14] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[15] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Kavita Bala,et al. Learning visual similarity for product design with convolutional neural networks , 2015, ACM Trans. Graph..

[17] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Trevor Darrell,et al. PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Seung Woo Lee,et al. Birdsnap: Large-Scale Fine-Grained Visual Categorization of Birds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Svetlana Lazebnik,et al. Where to Buy It: Matching Street Clothing Photos in Online Shops , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21] Peter N. Belhumeur,et al. POOF: Part-Based One-vs.-One Features for Fine-Grained Categorization, Face Verification, and Attribute Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[23] Marcel Simon,et al. Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24] W. John Kress,et al. Leafsnap: A Computer Vision System for Automatic Plant Species Identification , 2012, ECCV.

[25] Matthieu Guillaumin,et al. Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[26] David W. Jacobs,et al. Dog Breed Classification Using Part Localization , 2012, ECCV.

[27] Subhransu Maji,et al. Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28] Wayne D. Gray,et al. Basic objects in natural categories , 1976, Cognitive Psychology.

[29] Trevor Darrell,et al. Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[30] Larry S. Davis,et al. Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance , 2011, 2011 International Conference on Computer Vision.

[31] Feng Zhou,et al. Fine-Grained Image Classification by Exploring Bipartite-Graph Labels , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Fei-Fei Li,et al. Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[33] Rong Jin,et al. Fine-grained visual categorization via multi-stage metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[35] Jitendra Malik,et al. Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[36] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[37] Tianbao Yang,et al. Object-centric Sampling for Fine-grained Image Classification , 2014, ArXiv.

[38] Forrest N. Iandola,et al. Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[39] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[40] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Feng Zhou,et al. Fine-Grained Categorization and Dataset Bootstrapping Using Deep Metric Learning with Humans in the Loop , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Jonathan Krause,et al. The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition , 2015, ECCV.