Estimating the imageability of words by mining visual characteristics from crawled image data

Natural Language Processing and multi-modal analyses are key elements in many applications. However, the semantic gap is an everlasting problem, leading to unnatural results disconnected from the user’s perception. To understand semantics in multimedia applications, human perception needs to be taken into consideration. Imageability is an approach originating from Pyscholinguistics to quantize the human perception of words. Research shows a relationship between language usage and the imageability of words, making it useful for multimodal applications. However, the creation of imageability datasets is often manual and labor-intensive. In this paper, we propose a method using image data mining of a variety of visual features to estimate the imageability of words. The main assumption is a relationship between the imageability of concepts, human perception, and the contents of Web-crawled images. Using a set of low- and high-level visual features from Web-crawled images, a model is trained to predict imageability. The evaluations show that the imageability can be predicted with both a sufficiently low error, and a high correlation to the ground-truth annotations. The proposed method can be used to increase the corpus of imageability dictionaries.

[1]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  L. Yee Valence, arousal, familiarity, concreteness, and imageability ratings for 292 two-character Chinese nouns in Cantonese speakers in Hong Kong , 2017, PloS one.

[3]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[4]  Christian Biemann,et al.  What do we need to build explainable AI systems for the medical domain? , 2017, ArXiv.

[5]  Adam Jatowt,et al.  Estimating content concreteness for finding comprehensible documents , 2013, WSDM '13.

[6]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[7]  David M. Mimno,et al.  Quantifying the Visual Concreteness of Words and Topics in Multimodal Datasets , 2018, NAACL.

[8]  Hiroshi Murase,et al.  Estimating the visual variety of concepts by referring to Web popularity , 2018, Multimedia Tools and Applications.

[9]  Keiji Yanai,et al.  Image region entropy: a measure of "visualness" of web images associated with one concept , 2005, MULTIMEDIA '05.

[10]  Jamie Reilly,et al.  Formal Distinctiveness of High- and Low-Imageability Nouns: Analyses and Theoretical Implications , 2007, Cogn. Sci..

[11]  Kathy Hirsh-Pasek,et al.  Imageability predicts the age of acquisition of verbs in Chinese children* , 2008, Journal of Child Language.

[12]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[13]  Keiji Yanai,et al.  Visual Analysis of Tag Co-occurrence on Nouns and Adjectives , 2013, MMM.

[14]  Hanna Zijlstra,et al.  Validiteit van de Nederlandse versie van de Linguistic Inquiry and Word Count (liwc) , 2005 .

[15]  Veronika Coltheart,et al.  Effects of word imageability and age of acquisition on children's reading , 1988 .

[16]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[17]  Qi Tian,et al.  Social Anchor-Unit Graph Regularized Tensor Completion for Large-Scale Image Retagging , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  P. Schwanenflugel Why are Abstract Concepts Hard to Understand , 2013 .

[19]  Ali Farhadi,et al.  Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Michael J Cortese,et al.  Imageability ratings for 3,000 monosyllabic words , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[21]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Chris Callison-Burch,et al.  Learning Translations via Images with a Massively Multilingual Image Dataset , 2018, ACL.

[23]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[25]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[26]  Anita Peti-Stantic,et al.  Predicting Concreteness and Imageability of Words Within and Across Languages via Word Embeddings , 2018, Rep4NLP@ACL.

[27]  Andreas Holzinger,et al.  Towards the Augmented Pathologist: Challenges of Explainable-AI in Digital Pathology , 2017, ArXiv.

[28]  Koichi Shinoda,et al.  Adaptation of Word Vectors using Tree Structure for Visual Semantics , 2016, ACM Multimedia.

[29]  Barry Giesbrecht,et al.  Separable effects of semantic priming and imageability on word processing in human cortex. , 2004, Cerebral cortex.

[30]  Christian Wartena,et al.  Predicting Word Concreteness and Imagery , 2019, IWCS.

[31]  Meng Wang,et al.  Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  A. Paivio,et al.  Concreteness, imagery, and meaningfulness values for 925 nouns. , 1968, Journal of experimental psychology.

[33]  Cordelia Schmid,et al.  Evaluation of GIST descriptors for web-scale image search , 2009, CIVR '09.

[34]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[35]  Shuang Bai,et al.  A survey on automatic image caption generation , 2018, Neurocomputing.

[36]  Junyi Jessy Li,et al.  Fast and Accurate Prediction of Sentence Specificity , 2015, AAAI.

[37]  Gregory V. Jones Deep dyslexia, imageability, and ease of predication , 1985, Brain and Language.

[38]  Jinhui Tang,et al.  Generalized Deep Transfer Networks for Knowledge Propagation in Heterogeneous Domains , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[39]  Mingda Zhang,et al.  Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text , 2018, BMVC.

[40]  Ton Dijkstra,et al.  Affective Meaning, Concreteness, and Subjective Frequency Norms for Indonesian Words , 2016, Frontiers in psychology.

[41]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[42]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Klaus-Robert Müller,et al.  Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models , 2017, ArXiv.

[44]  Keiji Yanai,et al.  Automatic Construction of a Folksonomy-Based Visual Ontology , 2010, 2010 IEEE International Symposium on Multimedia.

[45]  Michael Wilson MRC Psycholinguistic Database , 2001 .

[46]  Jinhui Tang,et al.  Weakly-Shared Deep Transfer Networks for Heterogeneous-Domain Knowledge Propagation , 2015, ACM Multimedia.

[47]  Michael S. Bernstein,et al.  Empath: Understanding Topic Signals in Large-Scale Text , 2016, CHI.

[48]  F. Smolík,et al.  The power of imageability: How the acquisition of inflected forms is facilitated in highly imageable verbs and nouns in Czech children , 2015 .