Exploiting spatial relation for fine-grained image classification

Abstract Fine-Grained Image Classification (FGIC) aims to distinguish the images within a subordinate category. Recently, many FGIC methods have been proposed and huge progress has been made in the aspects of part detection and feature learning for FGIC. However, FGIC still remains a challenging task due to the large intra-class variance and small inter-class variance. To classify fine-grained images accurately, this paper proposes to exploit spatial relation to capture more discriminative details for FGIC. The proposed method contains two core modules: part selection module and representation module. The part selection module utilizes intrinsic spatial relation between object parts to select object part pairs with high discrimination power. The representation module exploits the interaction between object parts to describe the selected part pairs and construct a semantic image representation for FGIC. The proposed method is evaluated on CUB-200-2011 and FGVC-Aircraft datasets. Experimental results show that the classification accuracy of the proposed method can reach 85.5% on CUB-200-2011 and 86.9% on FGVC-Aircraft respectively, which exceed comparison methods obviously.

[1]  Xuelong Li,et al.  Latent Semantic Minimal Hashing for Image Retrieval , 2017, IEEE Transactions on Image Processing.

[2]  Pierre Alliez,et al.  Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[3]  Lei Zhang,et al.  Towards effective codebookless model for image classification , 2015, Pattern Recognit..

[4]  Xuelong Li,et al.  Bidirectional Adaptive Feature Fusion for Remote Sensing Scene Classification , 2017, CCCV.

[5]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Mario Fritz,et al.  Ask Your Neurons: A Deep Learning Approach to Visual Question Answering , 2016, International Journal of Computer Vision.

[7]  Xiangtao Zheng,et al.  Exploring Models and Data for Remote Sensing Image Caption Generation , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[8]  Romain Raveaux,et al.  Structured representations in a content based image retrieval context , 2013, J. Vis. Commun. Image Represent..

[9]  Xiaoqiang Lu,et al.  Action recognition by joint learning , 2016, Image Vis. Comput..

[10]  Iasonas Kokkinos,et al.  Deep Filter Banks for Texture Recognition, Description, and Segmentation , 2015, International Journal of Computer Vision.

[11]  Shenghuo Zhu,et al.  Efficient Object Detection and Segmentation for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Harish Karnick,et al.  Random Feature Maps for Dot Product Kernels , 2012, AISTATS.

[13]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[14]  Isabelle Bloch,et al.  Fuzzy spatial relationships for image processing and interpretation: a review , 2005, Image Vis. Comput..

[15]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Zhang Yi,et al.  High-Order Measurements for Residual Classifiers , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Arnold W. M. Smeulders,et al.  Fine-Grained Categorization by Alignments , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Dumitru Erhan,et al.  Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Stavros J. Perantonis,et al.  On the use of spatial relations between objects for image classification , 2007, AIAI.

[20]  Jonathan Krause,et al.  Fine-grained recognition without part annotations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Eliseo Clementini,et al.  Qualitative Representation of Positional Information , 1997, Artif. Intell..

[23]  Alberto Del Bimbo,et al.  Local Pyramidal Descriptors for Image Recognition , 2014, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Lei Zhang,et al.  Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Seung Woo Lee,et al.  Birdsnap: Large-Scale Fine-Grained Visual Categorization of Birds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Xiu-Shen Wei,et al.  Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval , 2016, IEEE Transactions on Image Processing.

[28]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[29]  Xiang Zhu,et al.  Supervised deep hashing for scalable face image retrieval , 2018, Pattern Recognit..

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Xiao Liu,et al.  Kernel Pooling for Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yuxin Peng,et al.  The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Rasmus Pagh,et al.  Fast and scalable polynomial kernels via explicit feature maps , 2013, KDD.

[35]  Cordelia Schmid,et al.  Good Practice in Large-Scale Learning for Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Jin Zhao,et al.  Discriminant deep belief network for high-resolution SAR image classification , 2017, Pattern Recognit..

[37]  Xuelong Li,et al.  Semi-Supervised Multitask Learning for Scene Recognition , 2015, IEEE Transactions on Cybernetics.

[38]  Xuelong Li,et al.  Hierarchical Recurrent Neural Hashing for Image Retrieval With Hierarchical Convolutional Features , 2018, IEEE Transactions on Image Processing.

[39]  Xiangtao Zheng,et al.  A discriminative representation for human action recognition , 2016, Pattern Recognit..

[40]  Larry S. Davis,et al.  Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance , 2011, 2011 International Conference on Computer Vision.

[41]  David W. Jacobs,et al.  Dog Breed Classification Using Part Localization , 2012, ECCV.

[42]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[43]  Jian Sun,et al.  Accelerating Very Deep Convolutional Networks for Classification and Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Ricardo da Silva Torres,et al.  Visual word spatial arrangement for image retrieval and classification , 2014, Pattern Recognit..

[45]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[46]  Qi Wu,et al.  FVQA: Fact-Based Visual Question Answering , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.