论文信息 - ConTraKG: Contrastive-based Transfer Learning for Visual Object Recognition using Knowledge Graphs

ConTraKG: Contrastive-based Transfer Learning for Visual Object Recognition using Knowledge Graphs

Deep learning techniques achieve high accuracy in computer vision tasks. However, their accuracy suffers considerably when they face a domain change, i.e., as soon as they are used in a domain that differs from their training domain. For example, a road sign recognition model trained to recognize road signs in Germany performs poorly in countries with different road sign standards like China. We propose ConTraKG, a neuro-symbolic approach that enables cross-domain transfer learning based on prior knowledge about the domain or context. A knowledge graph serves as a medium for encoding such prior knowledge, which is then transformed into a dense vector representation via embedding methods. Using a five-phase training pipeline, we train the deep neural network to adjust its visual embedding space according to the domaininvariant embedding space of the knowledge graph based on a contrastive loss function. This allows the neural network to incorporate training data from different target domains that are already represented in the knowledge graph. We conduct a series of empirical evaluations to determine the accuracy of our approach. The results show that ConTraKG is significantly more accurate than the conventional approach for dealing with domain changes. In a transfer learning setup, where the network is trained on both domains, ConTraKG achieves 21% higher accuracy when tested on the source domain and 15% when tested on the target domain compared to the standard approach. Moreover, with only 10% of the target data for training, it achieves the same accuracy as the cross-entropy-based model trained on the full target data.

[1] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Pasquale Minervini,et al. Convolutional 2D Knowledge Graph Embeddings , 2017, AAAI.

[3] Yi Yang,et al. Towards Real-Time Traffic Sign Detection and Classification , 2016, IEEE Transactions on Intelligent Transportation Systems.

[4] Johannes Stallkamp,et al. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[5] Cordelia Schmid,et al. Label-Embedding for Image Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Geoffrey E. Hinton,et al. Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[7] Tao Xiang,et al. Learning a Deep Embedding Model for Zero-Shot Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Dai Quoc Nguyen,et al. A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network , 2017, NAACL.

[9] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[10] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[12] Sanja Fidler,et al. Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[14] Tom Michael Mitchell,et al. Predicting Human Brain Activity Associated with the Meanings of Nouns , 2008, Science.

[15] Yann LeCun,et al. Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16] Lorenzo Rosasco,et al. Holographic Embeddings of Knowledge Graphs , 2015, AAAI.

[17] D. Ruppert. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[18] Barbara Plank,et al. Learning to select data for transfer learning with Bayesian Optimization , 2017, EMNLP.

[19] Abhinav Gupta,et al. Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[21] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[22] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Andrew Y. Ng,et al. Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[24] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[25] Yongxin Yang,et al. A Unified Perspective on Multi-Domain and Multi-Task Learning , 2014, ICLR.

[26] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Chao Yang,et al. A Survey on Deep Transfer Learning , 2018, ICANN.

[28] Steffen Staab,et al. Knowledge graphs , 2021, Commun. ACM.

[29] Ce Liu,et al. Supervised Contrastive Learning , 2020, NeurIPS.

[30] Yue Wang,et al. Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need? , 2020, ECCV.

[31] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[32] Alexander D'Amour,et al. Underspecification Presents Challenges for Credibility in Modern Machine Learning , 2020, J. Mach. Learn. Res..

[33] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.