Weakly Supervised Fine-Grained Image Classification via Salient Region Localization and Different Layer Feature Fusion

The fine-grained image classification task is about differentiating between different object classes. The difficulties of the task are large intra-class variance and small inter-class variance. For this reason, improving models’ accuracies on the task heavily relies on discriminative parts’ annotations and regional parts’ annotations. Such delicate annotations’ dependency causes the restriction on models’ practicability. To tackle this issue, a saliency module based on a weakly supervised fine-grained image classification model is proposed by this article. Through our salient region localization module, the proposed model can localize essential regional parts with the use of saliency maps, while only image class annotations are provided. Besides, the bilinear attention module can improve the performance on feature extraction by using higher- and lower-level layers of the network to fuse regional features with global features. With the application of the bilinear attention architecture, we propose the different layer feature fusion module to improve the expression ability of model features. We tested and verified our model on public datasets released specifically for fine-grained image classification. The results of our test show that our proposed model can achieve close to state-of-the-art classification performance on various datasets, while only the least training data are provided. Such a result indicates that the practicality of our model is incredibly improved since fine-grained image datasets are expensive.

[1]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[2]  Jianfei Cai,et al.  Weakly Supervised Fine-Grained Categorization With Part-Based Image Representation , 2016, IEEE Transactions on Image Processing.

[3]  Yuxin Peng,et al.  Object-Part Attention Model for Fine-Grained Image Classification , 2017, IEEE Transactions on Image Processing.

[4]  Xiu-Shen Wei,et al.  Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval , 2016, IEEE Transactions on Image Processing.

[5]  Jun-Wei Hsieh,et al.  Symmetrical SURF and Its Applications to Vehicle Detection and Vehicle Make and Model Recognition , 2014, IEEE Transactions on Intelligent Transportation Systems.

[6]  Bailing Zhang,et al.  Reliable Classification of Vehicle Types Based on Cascade Classifier Ensembles , 2013, IEEE Transactions on Intelligent Transportation Systems.

[7]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  张琳波,et al.  基于Bag-of-phrases的图像表示方法 , 2012 .

[9]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[10]  Pietro Perona,et al.  The Ignorant Led by the Blind: A Hybrid Human–Machine Vision System for Fine-Grained Categorization , 2014, International Journal of Computer Vision.

[11]  Yu Zhou,et al.  Fine-Grained Vehicle Model Recognition Using A Coarse-to-Fine Convolutional Neural Network Architecture , 2017, IEEE Transactions on Intelligent Transportation Systems.