A model for interpreting social interactions in local image regions

Understanding social interactions (such as 'hug' or 'fight') is a basic and important capacity of the human visual system, but a challenging and still open problem for modeling. In this work we study visual recognition of social interactions, based on small but recognizable local regions. The approach is based on two novel key components: (i) A given social interaction can be recognized reliably from reduced images (called 'minimal images'). (ii) The recognition of a social interaction depends on identifying components and relations within the minimal image (termed 'interpretation'). We show psychophysics data for minimal images and modeling results for their interpretation. We discuss the integration of minimal configurations in recognizing social interactions in a detailed, high-resolution image.

[1]  Shimon Ullman,et al.  Atoms of recognition in human and computer vision , 2016, Proceedings of the National Academy of Sciences.

[2]  E. Thoma Interpersonal Diagnosis of Personality , 1965 .

[3]  E. Hall,et al.  The Hidden Dimension , 1970 .

[4]  Gergely Csibra,et al.  Representation of stable social dominance relations by human infants , 2012, Proceedings of the National Academy of Sciences.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Shimon Ullman,et al.  A model for full local image interpretation , 2021, CogSci.

[7]  Nazli Ikizler-Cinbis,et al.  Facial descriptors for human interaction recognition in still images , 2015, Pattern Recognit. Lett..

[8]  Karen Wynn,et al.  Young infants prefer prosocial to antisocial others. , 2011, Cognitive development.

[9]  S. Ullman,et al.  Full interpretation of minimal images , 2018, Cognition.

[10]  Ian D. Reid,et al.  Structured Learning of Human Interactions in TV Shows , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Yi Yang,et al.  Recognizing proxemics in personal photos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  S. Carey,et al.  Big and Mighty: Preverbal Infants Mentally Represent Social Dominance , 2011, Science.

[13]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[14]  Alan L. Yuille,et al.  Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.