Hierarchical deep transfer learning for fine-grained categorization on micro datasets

Abstract Fine-grained categorization is challenging due to its small inter-class and large intra-class variance. Moreover, requiring domain expertise makes fine-grained labelled data much more expensive to acquire. Existing models predominantly require extra information such as bounding box and part annotation in addition to the image category labels, which involves heavy manual labor. In this paper, we propose a novel hierarchical deep transfer learning model, based on a compression convolutional neural network. Our model transfers the learned image representation from large-scale labelled fine-grained datasets to micro fine-grained datasets, which avoids using expensive annotations and realizes visual categorization task effectively. Firstly, we introduce a cohesion domain to measure correlation degree between source domain and target domain. Secondly, the source-domain convolutional neural network is adjusted according to its metrical feedback, in order to select task-specific features that are suitable for transferring to the target domain. Finally, we make most of perspective-class labels, which are inherent attributes of fine-grained data for multi-task learning and learn all the attributes through joint learning to extract more discriminative representations. The proposed model not only economizes training time effectively and achieves high categorization accuracy, but also verifies that the inter-domain feature transition can accelerate learning and optimization.

[1]  Xiao Liu,et al.  Fully Convolutional Attention Networks for Fine-Grained Recognition , 2016 .

[2]  Masaki Aono,et al.  Bi-linearly weighted fractional max pooling , 2017, Multimedia Tools and Applications.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[5]  Radu Tudor Ionescu,et al.  Have a SNAK. Encoding Spatial Information with the Spatial Non-alignment Kernel , 2015, ICIAP.

[6]  Eduardo Fidalgo,et al.  Boosting image classification through semantic attention filtering strategies , 2018, Pattern Recognit. Lett..

[7]  Lambert Schomaker,et al.  An analysis of rotation matrix and colour constancy data augmentation in classifying images of animals , 2018, J. Inf. Telecommun..

[8]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[9]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[10]  Qiang Yang,et al.  An Overview of Multi-task Learning , 2018 .

[11]  James J. Little,et al.  Fine-Grained Categorization for 3D Scene Understanding , 2012, BMVC.

[12]  Seunghoon Hong,et al.  Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[14]  Ryan M. Eustice,et al.  Ford Campus vision and lidar data set , 2011, Int. J. Robotics Res..

[15]  Pavel Kordík,et al.  Discovering predictive ensembles for transfer learning and meta-learning , 2017, Machine Learning.

[16]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[17]  Dieter Fox,et al.  Kernel Descriptors for Visual Recognition , 2010, NIPS.

[18]  Hervé Jégou,et al.  A Comparison of Dense Region Detectors for Image Search and Fine-Grained Classification , 2014, IEEE Transactions on Image Processing.

[19]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[20]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[21]  Sergio Escalera,et al.  Evolving weighting schemes for the Bag of Visual Words , 2017, Neural Computing and Applications.