Fine-Grained Image Classification Using Modified DCNNs Trained by Cascaded Softmax and Generalized Large-Margin Losses

We develop a fine-grained image classifier using a general deep convolutional neural network (DCNN). We improve the fine-grained image classification accuracy of a DCNN model from the following two aspects. First, to better model the $h$ -level hierarchical label structure of the fine-grained image classes contained in the given training data set, we introduce $h$ fully connected (fc) layers to replace the top fc layer of a given DCNN model and train them with the cascaded softmax loss. Second, we propose a novel loss function, namely, generalized large-margin (GLM) loss, to make the given DCNN model explicitly explore the hierarchical label structure and the similarity regularities of the fine-grained image classes. The GLM loss explicitly not only reduces between-class similarity and within-class variance of the learned features by DCNN models but also makes the subclasses belonging to the same coarse class be more similar to each other than those belonging to different coarse classes in the feature space. Moreover, the proposed fine-grained image classification framework is independent and can be applied to any DCNN structures. Comprehensive experimental evaluations of several general DCNN models (AlexNet, GoogLeNet, and VGG) using three benchmark data sets (Stanford car, fine-grained visual classification-aircraft, and CUB-200-2011) for the fine-grained image classification task demonstrate the effectiveness of our method.

[1]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Andrew Zisserman,et al.  Symbiotic Segmentation and Part Localization for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[5]  Yun Fu,et al.  Self-Taught Low-Rank Coding for Visual Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Nanning Zheng,et al.  Improving CNN Performance Accuracies With Min–Max Objective , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Pietro Perona,et al.  Bird Species Categorization Using Pose Normalized Deep Convolutional Nets , 2014, ArXiv.

[8]  Nanning Zheng,et al.  Entropy and orthogonality based deep discriminative feature learning for object recognition , 2018, Pattern Recognit..

[9]  Thomas Wennekers,et al.  A Spiking Self-Organizing Map Combining STDP, Oscillations, and Continuous Learning , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[11]  Feng Zhou,et al.  Fine-Grained Image Classification by Exploring Bipartite-Graph Labels , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Qi Tian,et al.  Fine-Grained Image Classification via Low-Rank Sparse Coding With General and Class-Specific Codebooks , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[15]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Xiaoou Tang,et al.  A large-scale car dataset for fine-grained categorization and verification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Nanning Zheng,et al.  Integrating Supervised Laplacian Objective with CNN for Object Recognition , 2016, PCM.

[18]  Nanning Zheng,et al.  Training DCNN by Combining Max-Margin, Max-Correlation Objectives, and Correntropy Loss for Multilabel Image Classification , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Shuicheng Yan,et al.  LG-CNN: From local parts to global discrimination for fine-grained recognition , 2017, Pattern Recognit..

[21]  Lixin Gao,et al.  Scalable Linear Visual Feature Learning via Online Parallel Nonnegative Matrix Factorization , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[23]  Seung Woo Lee,et al.  Birdsnap: Large-Scale Fine-Grained Visual Categorization of Birds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Yuxin Peng,et al.  The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Pietro Perona,et al.  The Ignorant Led by the Blind: A Hybrid Human–Machine Vision System for Fine-Grained Categorization , 2014, International Journal of Computer Vision.

[27]  Forrest N. Iandola,et al.  Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[28]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[31]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[32]  Larry S. Davis,et al.  Jointly Optimizing 3D Model Fitting and Fine-Grained Classification , 2014, ECCV.

[33]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[34]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[36]  Yihong Gong,et al.  Improving CNN Performance with Min-Max Objective , 2016, IJCAI.

[37]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[38]  Jonathan Krause,et al.  Learning Features and Parts for Fine-Grained Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[39]  Shuang Gao,et al.  A locality correlation preserving support vector machine , 2014, Pattern Recognit..

[40]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[41]  Jianfei Cai,et al.  An Exemplar-Based Multi-View Domain Generalization Framework for Visual Recognition , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Feng Zhou,et al.  Embedding Label Structures for Fine-Grained Feature Representation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Frédéric Chazal,et al.  Geometric Inference for Probability Measures , 2011, Found. Comput. Math..