论文信息 - Learning Visual Models Using a Knowledge Graph as a Trainer

Learning Visual Models Using a Knowledge Graph as a Trainer

Traditional computer vision approaches, based on neural networks (NN), are typically trained on a large amount of image data. By minimizing the cross-entropy loss between a prediction and a given class label, the NN and its visual embedding space are learned to fulfill a given task. However, due to the sole dependence on the image data distribution of the training domain, these models tend to fail when applied to a target domain that differs from their source domain. To learn a more robust NN to domain shifts, we propose the knowledge graph neural network (KG-NN), a neuro-symbolic approach that supervises the training using image-data-invariant auxiliary knowledge. The auxiliary knowledge is first encoded in a knowledge graph with respective concepts and their relationships, which is then transformed into a dense vector representation via an embedding method. Using a contrastive loss function, KG-NN learns to adapt its visual embedding space and thus its weights according to the image-data invariant knowledge graph embedding space. We evaluate KG-NN on visual transfer learning tasks for classification using the mini-ImageNet dataset and its derivatives, as well as road sign recognition datasets from Germany and China. The results show that a visual model trained with a knowledge graph as a trainer outperforms a model trained with cross-entropy in all experiments, in particular when the domain gap increases. Besides better performance and stronger robustness to domain shifts, these KG-NN adapts to multiple datasets and classes without suffering heavily from catastrophic forgetting.

Achim Rettinger | Lavdim Halilaj | Stefan Schmid | Sebastian Monka

[1] Jens Lehmann,et al. DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[2] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[3] Peter Bloem,et al. End-to-End Entity Classification on Multimodal Knowledge Graphs , 2020, ArXiv.

[4] Catherine Havasi,et al. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[5] Kihyuk Sohn,et al. Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[6] Samy Bengio,et al. Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[7] Jie Lin,et al. End-to-End Video Classification with Knowledge Graphs , 2017, ArXiv.

[8] D. Song,et al. The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9] Eric P. Xing,et al. Learning Robust Global Representations by Penalizing Local Predictive Power , 2019, NeurIPS.

[10] Dawn Song,et al. Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Christopher D. Manning,et al. Contrastive Learning of Medical Visual Representations from Paired Images and Text , 2020, MLHC.

[12] Yi Yang,et al. Towards Real-Time Traffic Sign Detection and Classification , 2016, IEEE Transactions on Intelligent Transportation Systems.

[13] Benjamin Recht,et al. Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[14] Yue Wang,et al. Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need? , 2020, ECCV.

[15] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Changsheng Xu,et al. I Know the Relationships: Zero-Shot Action Recognition via Two-Stream Graph Convolutional Networks and Knowledge Graphs , 2019, AAAI.

[17] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18] Geoffrey E. Hinton,et al. Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[19] Jiaoyan Chen,et al. Knowledge-Based Explanations for Transfer Learning , 2020, Knowledge Graphs for eXplainable Artificial Intelligence.

[20] Andrew Y. Ng,et al. Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[21] Pasquale Minervini,et al. Convolutional 2D Knowledge Graph Embeddings , 2017, AAAI.

[22] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[23] StallkampJ.,et al. 2012 Special Issue , 2012 .

[24] Tom Michael Mitchell,et al. Predicting Human Brain Activity Associated with the Meanings of Nouns , 2008, Science.

[25] Allan Jabri,et al. Learning Visual Features from Large Weakly Supervised Data , 2015, ECCV.

[26] Lorenzo Rosasco,et al. Holographic Embeddings of Knowledge Graphs , 2015, AAAI.

[27] D. Ruppert. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[28] Chao Yang,et al. A Survey on Deep Transfer Learning , 2018, ICANN.

[29] Steffen Staab,et al. Knowledge graphs , 2021, Commun. ACM.

[30] Ce Liu,et al. Supervised Contrastive Learning , 2020, NeurIPS.

[31] Yu-Chiang Frank Wang,et al. Multi-label Zero-Shot Learning with Structured Knowledge Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32] Venkatesh Saligrama,et al. Zero-Shot Learning via Semantic Similarity Embedding , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[34] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[35] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[36] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[37] Cordelia Schmid,et al. Label-Embedding for Image Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38] Xinlei Chen,et al. Iterative Visual Reasoning Beyond Convolutions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39] Xiu-Shen Wei,et al. Multi-Label Image Recognition With Graph Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Alexander D'Amour,et al. Underspecification Presents Challenges for Credibility in Modern Machine Learning , 2020, J. Mach. Learn. Res..

[41] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[42] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[43] Yann LeCun,et al. Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[44] Huajun Chen,et al. Human-centric Transfer Learning Explanation via Knowledge Graph [Extended Abstract] , 2019, ArXiv.

[45] Abhinav Gupta,et al. Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46] Chunhua Shen,et al. Explicit Knowledge-based Reasoning for Visual Question Answering , 2015, IJCAI.

[47] Robert Tibshirani,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2001, Springer Series in Statistics.

[48] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[49] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Barbara Plank,et al. Learning to select data for transfer learning with Bayesian Optimization , 2017, EMNLP.

[51] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[52] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[53] Dai Quoc Nguyen,et al. A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network , 2017, NAACL.