论文信息 - A coarse-to-fine capsule network for fine-grained image categorization

A coarse-to-fine capsule network for fine-grained image categorization

Abstract Fine-grained image categorization is challenging due to the subordinate categories within an entry-level category can only be distinguished by subtle discriminations. This necessitates localizing key (most discriminative) regions and extract domain-specific features alternately. Existing methods predominantly realize fine-grained categorization independently, while ignoring that representation learning and foreground localization can reinforce each other iteratively. Sharing the state-of-the-art performance of capsule encoding for abstract semantic representation, we formalize our pipeline as a coarse-to-fine capsule network (CTF-CapsNet). It consists of customized expert CapsNets arranged in each perception scale and region proposal networks (RPNs) between two adjacent scales. Their mutually motivated self-optimization can achieve increasingly specialized cross-utilization of object-level and component-level descriptions. The RPN zooms the areas to turn the attention to the most distinctive regions by concerning preceding informations learned by expert CapsNet for references, whilst a finer-scale model takes as feed an amplified attended patch from last scale. Overall, CTF-CapsNet is driven by three focal margin losses between label prediction and ground truth, and three regeneration losses between original input images/feature maps and reconstructed images. Experiments demonstrate that without any prior knowledge or strongly-supervised supports (e.g., bounding-box/part annotations), CTF-CapsNet can deliver competitive categorization performance among state-of-the-arts, i.e., testing accuracy achieves 89.57%, 88.63%, 90.51%, and 91.53% on our hand-crafted rice growth image set and three public benchmarks, i.e., CUB Birds, Stanford Dogs, and Stanford Cars, respectively.

[1] Qixiang Ye,et al. Selective Sparse Sampling for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Angran Lin,et al. Fast Discovery of Discriminative Mid-level Patches , 2015, ICPRAM.

[4] Yuxin Peng,et al. Object-Part Attention Model for Fine-Grained Image Classification , 2017, IEEE Transactions on Image Processing.

[5] Pietro Perona,et al. Multiclass recognition and part localization with humans in the loop , 2011, 2011 International Conference on Computer Vision.

[6] Tao Mei,et al. Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7] Kai Zhu,et al. Fast Multi-Label Low-Rank Linearized SVM Classification Algorithm Based on Approximate Extreme Points , 2018, IEEE Access.

[8] Wanlin Gao,et al. Fine-grained visual categorization of butterfly specimens at sub-species level via a convolutional neural network with skip-connections , 2020, Neurocomputing.

[9] Pietro Perona,et al. Bird Species Categorization Using Pose Normalized Deep Convolutional Nets , 2014, ArXiv.

[10] Pietro Perona,et al. Improved Bird Species Recognition Using Pose Normalized Deep Convolutional Nets , 2014, BMVC.

[11] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12] Kwang In Kim,et al. On Implicit Filter Level Sparsity in Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Yong Xu,et al. Capsule Routing for Sound Event Detection , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[14] Wanlin Gao,et al. A novel quadruple generative adversarial network for semi-supervised categorization of low-resolution images , 2020, Neurocomputing.

[15] Trevor Darrell,et al. Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[16] Geoffrey E. Hinton,et al. Matrix capsules with EM routing , 2018, ICLR.

[17] Yuxin Peng,et al. The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Jonathan Krause,et al. Fine-grained recognition without part annotations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Thomas S. Huang,et al. Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[20] Feng Huang,et al. A hierarchical coarse-to-fine perception for small-target categorization of butterflies under complex backgrounds , 2020, J. Intell. Fuzzy Syst..

[21] Haibo Hu,et al. CrossNet: Detecting Objects as Crosses , 2021 .

[22] Hamid Hassanpour,et al. A Cascaded Part-Based System for Fine-Grained Vehicle Classification , 2018, IEEE Transactions on Intelligent Transportation Systems.

[23] Yongdong Zhang,et al. AutoBD: Automated Bi-Level Description for Scalable Fine-Grained Visual Categorization , 2018, IEEE Transactions on Image Processing.

[24] Ya Zhang,et al. Part-Stacked CNN for Fine-Grained Visual Categorization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Tieniu Tan,et al. Iris Image Classification Based on Hierarchical Visual Codebook , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Tao Mei,et al. Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Ahmed M. Elgammal,et al. SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-Grained Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Yizhou Yu,et al. Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification From the Bottom Up , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Jon Atli Benediktsson,et al. Deep Convolutional Capsule Network for Hyperspectral Image Spectral and Spectral-Spatial Classification , 2019, Remote. Sens..

[31] Jonathan Krause,et al. The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition , 2015, ECCV.

[32] Geoffrey E. Hinton,et al. Dynamic Routing Between Capsules , 2017, NIPS.

[33] Yu Zhou,et al. Fine-Grained Vehicle Model Recognition Using A Coarse-to-Fine Convolutional Neural Network Architecture , 2017, IEEE Transactions on Intelligent Transportation Systems.

[34] Konstantinos N. Plataniotis,et al. Brain Tumor Type Classification via Capsule Networks , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[35] Angran Lin,et al. Learning Discriminative Mid-Level Patches for Fast Scene Classification , 2015, ICPRAM.

[36] Premkumar Natarajan,et al. CapsuleGAN: Generative Adversarial Capsule Network , 2018, ECCV Workshops.

[37] Tao Mei,et al. Destruction and Construction Learning for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Trevor Darrell,et al. Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Wanlin Gao,et al. Increasingly Specialized Perception Network for Fine-Grained Visual Categorization of Butterfly Specimens , 2019, IEEE Access.

[40] Hefei Ling,et al. Hierarchical Joint CNN-Based Models for Fine-Grained Cars Recognition , 2016, ICCCS.

[41] Mohammad Taha Bahadori,et al. Spectral Capsule Networks , 2018 .

[42] Yongdong Zhang,et al. Coarse-to-Fine Description for Fine-Grained Visual Categorization , 2016, IEEE Transactions on Image Processing.

[43] Anoop Cherian,et al. Part-based fine-grained bird image retrieval respecting species correlation , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[44] Chen Xu,et al. MS-CapsNet: A Novel Multi-Scale Capsule Network , 2018, IEEE Signal Processing Letters.

[45] Dacheng Tao,et al. Learning a Mixture of Granularity-Specific Experts for Fine-Grained Categorization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Xiu-Shen Wei,et al. Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Image Recognition , 2016, ArXiv.

[48] Sang-Chul Lee,et al. Weighted SVM with classification uncertainty for small training samples , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[49] Larry S. Davis,et al. Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance , 2011, 2011 International Conference on Computer Vision.

[50] Li Fei-Fei,et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51] David Zhang,et al. F-SVM: Combination of Feature Transformation and SVM Learning via Convex Relaxation , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[52] Dong Wang,et al. Learning to Navigate for Fine-grained Classification , 2018, ECCV.

[53] Shaomin Mu,et al. A Novel Method of Maize Leaf Disease Image Identification Based on a Multichannel Convolutional Neural Network , 2018 .