Clustering-based Model for Predicting Multi-spatial Relations in Images

Detecting spatial relations between objects in an image is a core task in image understanding and grounded natural language. This problem has been addressed in cognitive linguistics through the development of template and computational models from controlled experimental data using 2D or 3D synthetic diagrams. Furthermore, the Computer Vision (CV) and Natural Language Processing (NLP) communities developed machine learning models for real-world images mostly from crowd-sourced data. The latter models treat the problem as a single-label classification problem, whereas the problem is inherently a multi-label problem. In this paper, we learn a multi-label model based on computed spatial features. We chose to implement the model using a clustering-based approach since, apart from predicting multi-labels for a given instance, this method would allow us to get deeper insights into how spatial relations are related to each other. In this paper, we report our results from this model and a direct comparison with a Random Forest single-label classifier is presented. The proposed model generally shows that it outperforms the single-label classifier even when considering the top four prepositions predicted by the single-label classifier.

[1]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[2]  Laura A. Carlson-Radvansky,et al.  The Influence of Reference Frame Selection on Spatial Template Construction , 1997 .

[3]  Adrian Muscat,et al.  Learning to Generate Descriptions of Visual Data Anchored in Spatial Relations , 2017, IEEE Computational Intelligence Magazine.

[4]  Gordon D. Logan,et al.  A computational analysis of the apprehension of spatial relations , 1996 .

[5]  John D. Kelleher,et al.  A Context-dependent Algorithm for Generating Locative Expressions in Physically Situated Environments , 2005, ENLG.

[6]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[8]  Kenny R. Coventry,et al.  The interplay between geometry and function in the comprehension of''over , 2001 .

[9]  Laura A. Carlson,et al.  Grounding spatial language in perception: an empirical and computational investigation. , 2001, Journal of experimental psychology. General.

[10]  Adrian Muscat,et al.  Describing Spatial Relationships between Objects in Images in English and French , 2015, VL@EMNLP.

[11]  John D. Kelleher,et al.  The effect of occlusion on the semantics of projective spatial terms: a case study in grounding language in perception , 2011, Cognitive Processing.

[12]  Bo Dai,et al.  Detecting Visual Relationships with Deep Relational Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Adrian Muscat,et al.  SpatialVOC2K: A Multilingual Dataset of Images with Annotations and Features for Spatial Relations between Objects , 2018, INLG.

[14]  Francesc Moreno-Noguer,et al.  Combining Geometric, Textual and Visual Features for Predicting Prepositions in Image Descriptions , 2015, EMNLP.

[15]  Ali Farhadi,et al.  Recognition using visual phrases , 2011, CVPR 2011.

[16]  Laura A. Carlson-Radvansky,et al.  The Influence of Functional Relations on Spatial Term Selection , 1996 .

[17]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[19]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[20]  Larry S. Davis,et al.  Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  John D. Kelleher,et al.  Exploration of functional semantics of prepositions from corpora of descriptions of visual scenes , 2014, VL@COLING.

[22]  Adrian Muscat,et al.  Adding the Third Dimension to Spatial Relation Detection in 2D Images , 2018, INLG.

[23]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[24]  Michael S. Bernstein,et al.  Visual Relationship Detection with Language Priors , 2016, ECCV.