Not Just a Matter of Semantics: The Relationship Between Visual and Semantic Similarity

Knowledge transfer, zero-shot learning and semantic image retrieval are methods that aim at improving accuracy by utilizing semantic information, e.g. from WordNet. It is assumed that this information can augment or replace missing visual data in the form of labeled training images because semantic similarity somewhat aligns with visual similarity. This assumption may seem trivial, but is crucial for the application of such semantic methods. Any violation can cause mispredictions. Thus, it is important to examine the visual-semantic relationship for a certain target problem. In this paper, we use five different semantic and visual similarity measures each to thoroughly analyze the relationship without relying too much on any single definition. We postulate and verify three highly consequential hypotheses on the relationship. Our results show that it indeed exists and that WordNet semantic similarity carries more information about visual similarity than just the knowledge of "different classes look different". They suggest that classification is not the ideal application for semantic methods and that wrong semantic information is much worse than none.

[1]  Deyi Xiong,et al.  Semantic Similarity from Natural Language and Ontology Analysis , 2016, Computational Linguistics.

[2]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[3]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[4]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Tony X. Han,et al.  Deep convolutional neural network based species recognition for wild animal monitoring , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[6]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[7]  Abdenour Hadid,et al.  A Survey on Computer Vision for Assistive Medical Diagnosis From Faces , 2018, IEEE Journal of Biomedical and Health Informatics.

[8]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[9]  Bernt Schiele,et al.  Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.

[10]  Ajay Kumar,et al.  Computer-Vision-Based Fabric Defect Detection: A Survey , 2008, IEEE Transactions on Industrial Electronics.

[11]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[12]  Junzhong Gu,et al.  A New Model of Information Content for Semantic Similarity in WordNet , 2008, 2008 Second International Conference on Future Generation Communication and Networking Symposia.

[13]  Thomas Deselaers,et al.  Visual and semantic similarity in ImageNet , 2011, CVPR 2011.

[14]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[15]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[16]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[17]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[18]  Angelika Bayer,et al.  A First Course In Probability , 2016 .

[19]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[20]  Sheldon M. Ross,et al.  A First Course in Probability , 1979 .

[21]  Mohammed A.-M. Salem,et al.  Recent Survey on Medical Image Segmentation , 2017 .

[22]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[23]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[24]  Bernt Schiele,et al.  International Journal of Computer Vision manuscript No. (will be inserted by the editor) Semantic Modeling of Natural Scenes for Content-Based Image Retrieval , 2022 .

[25]  L. Goddard First Course , 1969, Nature.

[26]  Steffen Staab,et al.  Comparing ontologies - similarity measures and a comparison study , 2001 .

[27]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[28]  Matthias Bethge,et al.  A note on the evaluation of generative models , 2015, ICLR.

[29]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[30]  Xiaoming Liu,et al.  Do Convolutional Neural Networks Learn Class Hierarchy? , 2017, IEEE Transactions on Visualization and Computer Graphics.

[31]  Joachim Denzler,et al.  Hierarchy-Based Image Embeddings for Semantic Image Retrieval , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[32]  Joachim Denzler,et al.  One-Shot Learning of Object Categories Using Dependent Gaussian Processes , 2010, DAGM-Symposium.

[33]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[34]  Olivier Verscheure,et al.  Perceptual quality measure using a spatiotemporal model of the human visual system , 1996, Electronic Imaging.

[35]  A. Tversky Features of Similarity , 1977 .

[36]  David Sánchez,et al.  Ontology-based semantic similarity: A new feature-based approach , 2012, Expert Syst. Appl..

[37]  Joachim Denzler,et al.  Chimpanzee Faces in the Wild: Log-Euclidean CNNs for Predicting Identities and Attributes of Primates , 2016, GCPR.

[38]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Euripides G. M. Petrakis,et al.  A survey on industrial vision systems, applications, tools , 2003, Image Vis. Comput..

[40]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Joachim Denzler,et al.  Towards Automated Visual Monitoring of Individual Gorillas in the Wild , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).