This paper reports preliminary experiments aiming at verifying the conjecture that semantic compositionality is a general process irrespective of the underlying modality. In particular, we model compositionality of an attribute with an object in the visual modality as done in the case of an adjective with a noun in the linguistic modality. Our experiments show that the concept topologies in the two modalities share similarities, results that strengthen our conjecture. 1 Language and Vision Recently, fields like computational linguistics and computer vision have converged to a common way of capturing and representing the linguistic and visual information of atomic concepts, through vector space models. At the same time, advances in computational semantics have lead to effective and linguistically inspired approaches of extending such methods from single concepts to arbitrary linguistic units (e.g. phrases), through means of vector-based semantic composition (Mitchell and Lapata, 2010). Compositionality is not to be considered only an important component from a linguistic perspective, but also from a cognitive perspective and there has been efforts to validate it as a general cognitive process. However, in computer vision so far compositionality has received limited attention. Thus, in this work, we study the phenomenon of visual compositionality and we complement limited previous literature that has focused on event compositionality (St¨
[1]
Toben H. Mintz,et al.
Adjectives really do modify nouns: the incremental and restricted nature of early adjective acquisition
,
2002,
Cognition.
[2]
Andrew Zisserman,et al.
Image Classification using Random Forests and Ferns
,
2007,
2007 IEEE 11th International Conference on Computer Vision.
[3]
Marco Baroni,et al.
Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space
,
2010,
EMNLP.
[4]
Mirella Lapata,et al.
Composition in Distributional Models of Semantics
,
2010,
Cogn. Sci..
[5]
Fei-Fei Li,et al.
Attribute Learning in Large-Scale Datasets
,
2010,
ECCV Workshops.
[6]
Andrew Y. Ng,et al.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks
,
2011,
ICML.
[7]
Matthijs C. Dorst.
Distinctive Image Features from Scale-Invariant Keypoints
,
2011
.
[8]
Nicu Sebe,et al.
(Unseen) event recognition via semantic compositionality
,
2012,
2012 IEEE Conference on Computer Vision and Pattern Recognition.