A concept ontology triplet network for learning discriminative representations of fine-grained classes

Triplet network is an efficient method of metric learning, but with the increase of the number of fine-grained images and sample categories, the training of Triplet network is more and more challengeable. In order to solve this problem, this paper proposes an algorithm that effectively combine Concept Ontology Structure with the Triplet network trained of Two-layer Ontology Loss. It not only utilizes semantic knowledge to guide the Concept Ontology Structure of the network, but also makes use of the relationship between the layers to make the network more effective to see the triplets, which enhances the separability of the learned features. At the same time, we also use the bilinear function jointly trained with the Triplet network to enhance the image details, further improving the performance of the network. Finally, the effectiveness of the proposed algorithm is also proved by the results of classification experiments on the fine-grained image databases - Orchid and Fashion60.

[1]  Silvio Savarese,et al.  Find the Best Path: An Efficient and Accurate Classifier for Image Hierarchies , 2013, 2013 IEEE International Conference on Computer Vision.

[2]  Yi Yang,et al.  Augmenting Image Descriptions Using Structured Prediction Output , 2014, IEEE Transactions on Multimedia.

[3]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[4]  Feng Zhou,et al.  Embedding Label Structures for Fine-Grained Feature Representation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jonathon Shlens,et al.  Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Nanning Zheng,et al.  Person Re-identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[8]  Shuicheng Yan,et al.  Training Group Orthogonal Neural Networks with Privileged Information , 2017, IJCAI.

[9]  Gang Hua,et al.  Labeled Faces in the Wild: A Survey , 2016 .

[10]  Tieniu Tan,et al.  Deep semantic ranking based hashing for multi-label image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Kaiqi Huang,et al.  Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Zhenghang Yuan,et al.  GETNET: A General End-to-end Two-dimensional CNN Framework for Hyperspectral Image Change Detection , 2019, ArXiv.

[14]  Carlos D. Castillo,et al.  Triplet probabilistic embedding for face verification and clustering , 2016, 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[15]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Robinson Piramuthu,et al.  HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[18]  Kaiqi Huang,et al.  A Multi-Task Deep Network for Person Re-Identification , 2016, AAAI.

[19]  Y. Rui,et al.  Learning to Rank Using User Clicks and Visual Features for Image Retrieval , 2015, IEEE Transactions on Cybernetics.

[20]  Kavita Bala,et al.  Learning visual similarity for product design with convolutional neural networks , 2015, ACM Trans. Graph..

[21]  Xuelong Li,et al.  Scene Classification With Recurrent Attention of VHR Remote Sensing Images , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[22]  Priyadarshini Panda,et al.  Tree-CNN: A hierarchical Deep Convolutional Neural Network for incremental learning , 2018, Neural Networks.

[23]  Alexander J. Smola,et al.  Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Jianping Fan,et al.  Deep Learning of Path-Based Tree Classifiers for Large-Scale Plant Species Identification , 2018, 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).

[25]  Chong Wang,et al.  How to Train Triplet Networks with 100K Identities? , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[26]  Li Ying Orthogonal incremental extreme learning machine for regression and multiclass classification , 2014, Neural Computing and Applications.

[27]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[28]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[29]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[30]  Jun Yu,et al.  Click Prediction for Web Image Reranking Using Multimodal Sparse Coding , 2014, IEEE Transactions on Image Processing.

[31]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[32]  Guoying Zhao,et al.  Spatiotemporal Recurrent Convolutional Networks for Recognizing Spontaneous Micro-Expressions , 2019, IEEE Transactions on Multimedia.

[33]  Samy Bengio,et al.  Large-Scale Object Classification Using Label Relation Graphs , 2014, ECCV.

[34]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Jianping Fan,et al.  Deep Multi-task Learning for Large-Scale Image Classification , 2017, 2017 IEEE Third International Conference on Multimedia Big Data (BigMM).

[36]  Jiangqun Ni,et al.  Deep Learning Hierarchical Representations for Image Steganalysis , 2017, IEEE Transactions on Information Forensics and Security.

[37]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[38]  Frédéric Jurie,et al.  Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classiffication , 2016, ECCV.

[39]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[40]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[41]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Xiang Yu,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2016 .

[43]  Jun Yu,et al.  Exploiting Click Constraints and Multi-view Features for Image Re-ranking , 2014, IEEE Transactions on Multimedia.

[44]  Qian Du,et al.  GETNET: A General End-to-End 2-D CNN Framework for Hyperspectral Image Change Detection , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[45]  Yihong Gong,et al.  Deep Metric Learning with Improved Triplet Loss for Face Clustering in Videos , 2016, PCM.

[46]  Jianping Fan,et al.  HD-MTL: Hierarchical Deep Multi-Task Learning for Large-Scale Visual Recognition , 2017, IEEE Transactions on Image Processing.

[47]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[48]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[50]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Ian D. Reid,et al.  Fast Training of Triplet-Based Deep Binary Embedding Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[54]  Xuelong Li,et al.  Block-Row Sparse Multiview Multilabel Learning for Image Classification , 2016, IEEE Transactions on Cybernetics.

[55]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[56]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[57]  Qi Wang,et al.  Robust Hierarchical Deep Learning for Vehicular Management , 2019, IEEE Transactions on Vehicular Technology.

[58]  Jianping Fan,et al.  Leveraging Content Sensitiveness and User Trustworthiness to Recommend Fine-Grained Privacy Settings for Social Image Sharing , 2018, IEEE Transactions on Information Forensics and Security.

[59]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[60]  Feiping Nie,et al.  Detecting Coherent Groups in Crowd Scenes by Multiview Clustering , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.