A coarse-to-fine capsule network for fine-grained image categorization

Abstract Fine-grained image categorization is challenging due to the subordinate categories within an entry-level category can only be distinguished by subtle discriminations. This necessitates localizing key (most discriminative) regions and extract domain-specific features alternately. Existing methods predominantly realize fine-grained categorization independently, while ignoring that representation learning and foreground localization can reinforce each other iteratively. Sharing the state-of-the-art performance of capsule encoding for abstract semantic representation, we formalize our pipeline as a coarse-to-fine capsule network (CTF-CapsNet). It consists of customized expert CapsNets arranged in each perception scale and region proposal networks (RPNs) between two adjacent scales. Their mutually motivated self-optimization can achieve increasingly specialized cross-utilization of object-level and component-level descriptions. The RPN zooms the areas to turn the attention to the most distinctive regions by concerning preceding informations learned by expert CapsNet for references, whilst a finer-scale model takes as feed an amplified attended patch from last scale. Overall, CTF-CapsNet is driven by three focal margin losses between label prediction and ground truth, and three regeneration losses between original input images/feature maps and reconstructed images. Experiments demonstrate that without any prior knowledge or strongly-supervised supports (e.g., bounding-box/part annotations), CTF-CapsNet can deliver competitive categorization performance among state-of-the-arts, i.e., testing accuracy achieves 89.57%, 88.63%, 90.51%, and 91.53% on our hand-crafted rice growth image set and three public benchmarks, i.e., CUB Birds, Stanford Dogs, and Stanford Cars, respectively.

[1]  Qixiang Ye,et al.  Selective Sparse Sampling for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Angran Lin,et al.  Fast Discovery of Discriminative Mid-level Patches , 2015, ICPRAM.

[4]  Yuxin Peng,et al.  Object-Part Attention Model for Fine-Grained Image Classification , 2017, IEEE Transactions on Image Processing.

[5]  Pietro Perona,et al.  Multiclass recognition and part localization with humans in the loop , 2011, 2011 International Conference on Computer Vision.

[6]  Tao Mei,et al.  Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Kai Zhu,et al.  Fast Multi-Label Low-Rank Linearized SVM Classification Algorithm Based on Approximate Extreme Points , 2018, IEEE Access.

[8]  Wanlin Gao,et al.  Fine-grained visual categorization of butterfly specimens at sub-species level via a convolutional neural network with skip-connections , 2020, Neurocomputing.

[9]  Pietro Perona,et al.  Bird Species Categorization Using Pose Normalized Deep Convolutional Nets , 2014, ArXiv.

[10]  Pietro Perona,et al.  Improved Bird Species Recognition Using Pose Normalized Deep Convolutional Nets , 2014, BMVC.

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Kwang In Kim,et al.  On Implicit Filter Level Sparsity in Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yong Xu,et al.  Capsule Routing for Sound Event Detection , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[14]  Wanlin Gao,et al.  A novel quadruple generative adversarial network for semi-supervised categorization of low-resolution images , 2020, Neurocomputing.

[15]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[16]  Geoffrey E. Hinton,et al.  Matrix capsules with EM routing , 2018, ICLR.

[17]  Yuxin Peng,et al.  The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jonathan Krause,et al.  Fine-grained recognition without part annotations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[20]  Feng Huang,et al.  A hierarchical coarse-to-fine perception for small-target categorization of butterflies under complex backgrounds , 2020, J. Intell. Fuzzy Syst..

[21]  Haibo Hu,et al.  CrossNet: Detecting Objects as Crosses , 2021 .

[22]  Hamid Hassanpour,et al.  A Cascaded Part-Based System for Fine-Grained Vehicle Classification , 2018, IEEE Transactions on Intelligent Transportation Systems.

[23]  Yongdong Zhang,et al.  AutoBD: Automated Bi-Level Description for Scalable Fine-Grained Visual Categorization , 2018, IEEE Transactions on Image Processing.

[24]  Ya Zhang,et al.  Part-Stacked CNN for Fine-Grained Visual Categorization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Tieniu Tan,et al.  Iris Image Classification Based on Hierarchical Visual Codebook , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Tao Mei,et al.  Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Ahmed M. Elgammal,et al.  SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-Grained Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yizhou Yu,et al.  Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification From the Bottom Up , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jon Atli Benediktsson,et al.  Deep Convolutional Capsule Network for Hyperspectral Image Spectral and Spectral-Spatial Classification , 2019, Remote. Sens..

[31]  Jonathan Krause,et al.  The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition , 2015, ECCV.

[32]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[33]  Yu Zhou,et al.  Fine-Grained Vehicle Model Recognition Using A Coarse-to-Fine Convolutional Neural Network Architecture , 2017, IEEE Transactions on Intelligent Transportation Systems.

[34]  Konstantinos N. Plataniotis,et al.  Brain Tumor Type Classification via Capsule Networks , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[35]  Angran Lin,et al.  Learning Discriminative Mid-Level Patches for Fast Scene Classification , 2015, ICPRAM.

[36]  Premkumar Natarajan,et al.  CapsuleGAN: Generative Adversarial Capsule Network , 2018, ECCV Workshops.

[37]  Tao Mei,et al.  Destruction and Construction Learning for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Trevor Darrell,et al.  Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Wanlin Gao,et al.  Increasingly Specialized Perception Network for Fine-Grained Visual Categorization of Butterfly Specimens , 2019, IEEE Access.

[40]  Hefei Ling,et al.  Hierarchical Joint CNN-Based Models for Fine-Grained Cars Recognition , 2016, ICCCS.

[41]  Mohammad Taha Bahadori,et al.  Spectral Capsule Networks , 2018 .

[42]  Yongdong Zhang,et al.  Coarse-to-Fine Description for Fine-Grained Visual Categorization , 2016, IEEE Transactions on Image Processing.

[43]  Anoop Cherian,et al.  Part-based fine-grained bird image retrieval respecting species correlation , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[44]  Chen Xu,et al.  MS-CapsNet: A Novel Multi-Scale Capsule Network , 2018, IEEE Signal Processing Letters.

[45]  Dacheng Tao,et al.  Learning a Mixture of Granularity-Specific Experts for Fine-Grained Categorization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Xiu-Shen Wei,et al.  Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Image Recognition , 2016, ArXiv.

[48]  Sang-Chul Lee,et al.  Weighted SVM with classification uncertainty for small training samples , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[49]  Larry S. Davis,et al.  Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance , 2011, 2011 International Conference on Computer Vision.

[50]  Li Fei-Fei,et al.  DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  David Zhang,et al.  F-SVM: Combination of Feature Transformation and SVM Learning via Convex Relaxation , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[52]  Dong Wang,et al.  Learning to Navigate for Fine-grained Classification , 2018, ECCV.

[53]  Shaomin Mu,et al.  A Novel Method of Maize Leaf Disease Image Identification Based on a Multichannel Convolutional Neural Network , 2018 .