Estimating the visual variety of concepts by referring to Web popularity

Increasingly sophisticated methods for data processing demand knowledge on the semantic relationship between language and vision. New fields of research like Explainable AI demand to step away from black-boxed approaches and understanding how the underlying semantics of data sets and AI models work. Advancements in Psycholinguistics suggest, that there is a relationship from language perception to how language production and sentence creation work. In this paper, a method to measure the visual variety of concepts is proposed to quantify the semantic gap between vision and language. For this, an image corpus is recomposed using ImageNet and Web data. Web-based metrics for measuring the popularity of sub-concepts are used as a weighting to ensure that the image composition in a dataset is as natural as possible. Using clustering methods, a score describing the visual variety of each concept is determined. A crowd-sourced survey is conducted to create ground-truth values applicable for this research. The evaluations show that the recomposed image corpus largely improves the measured variety compared to previous datasets. The results are promising and give additional knowledge about the relationship of language and vision.

[1]  L. Thurstone The method of paired comparisons for social values , 1927 .

[2]  A. Paivio,et al.  Concreteness, imagery, and meaningfulness values for 925 nouns. , 1968, Journal of experimental psychology.

[3]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[4]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[6]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[7]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[8]  Keiji Yanai,et al.  Image region entropy: a measure of "visualness" of web images associated with one concept , 2005, MULTIMEDIA '05.

[9]  Shih-Fu Chang,et al.  To search or to label?: predicting the performance of search-based automatic image classifiers , 2006, MIR '06.

[10]  E. Loper,et al.  NLTK: The Natural Language Toolkit , 2006, ACL 2006.

[11]  W. Pirie Spearman Rank Correlation Coefficient , 2006 .

[12]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[13]  Ximena Olivares,et al.  Visual diversification of image search results , 2009, WWW '09.

[14]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[15]  Keiji Yanai,et al.  Automatic Construction of a Folksonomy-Based Visual Ontology , 2010, 2010 IEEE International Symposium on Multimedia.

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[18]  Keiji Yanai,et al.  Visual Analysis of Tag Co-occurrence on Nouns and Adjectives , 2013, MMM.

[19]  Ali Farhadi,et al.  Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Adam Kilgarriff,et al.  The Sketch Engine: ten years on , 2014 .

[21]  Noboru Babaguchi,et al.  Inter-Concept Distance Measurement with Adaptively Weighted Multiple Visual Features , 2014, ACCV Workshops.

[22]  Harald Sack,et al.  What Image Classifiers Really See - Visualizing Bag-of-Visual Words Models , 2015, MMM.

[23]  Junyi Jessy Li,et al.  Fast and Accurate Prediction of Sentence Specificity , 2015, AAAI.

[24]  F. Smolík,et al.  The power of imageability: How the acquisition of inflected forms is facilitated in highly imageable verbs and nouns in Czech children , 2015 .

[25]  Koichi Shinoda,et al.  Adaptation of Word Vectors using Tree Structure for Visual Semantics , 2016, ACM Multimedia.

[26]  Andreas Holzinger,et al.  Towards the Augmented Pathologist: Challenges of Explainable-AI in Digital Pathology , 2017, ArXiv.

[27]  Noboru Babaguchi,et al.  Effect of Junk Images on Inter-concept Distance Measurement: Positive or Negative? , 2017, MMM.

[28]  Christian Biemann,et al.  What do we need to build explainable AI systems for the medical domain? , 2017, ArXiv.

[29]  Klaus-Robert Müller,et al.  Explainable artificial intelligence , 2017 .