Gaze Perception in Humans and CNN-Based Model

Making accurate inferences about other individuals’ locus of attention is essential for human social interactions and will be important for AI to effectively interact with humans. In this study, we compare how a CNN (convolutional neural network) based model of gaze and humans infer the locus of attention in images of real-world scenes with a number of individuals looking at a common location. We show that compared to the model, humans’ estimates of the locus of attention are more influenced by the context of the scene, such as the presence of the attended target and the number of individuals in the image.

[1]  J. Hietanen Does your gaze direction and head orientation shift my visual attention? , 1999, Neuroreport.

[2]  S. P. Arun,et al.  Do Computational Models Differ Systematically from Human Object Perception? , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Yiannis Demiris,et al.  RT-GENE: Real-Time Eye Gaze Estimation in Natural Environments , 2018, ECCV.

[4]  S. Baron-Cohen,et al.  Gaze Perception Triggers Reflexive Visuospatial Orienting , 1999 .

[5]  S. Tipper,et al.  Gaze cueing of attention: visual attention, social cognition, and individual differences. , 2007, Psychological bulletin.

[6]  Mark H. Johnson,et al.  Atypical eye contact in autism: Models, mechanisms and development , 2009, Neuroscience & Biobehavioral Reviews.

[7]  A. Kingstone,et al.  The eyes have it! Reflexive orienting is triggered by nonpredictive gaze , 1998 .

[8]  Katarzyna Chawarska,et al.  Automatic attention cueing through eye movement in 2-year-old children with autism. , 2003, Child development.

[9]  Mark H. Johnson,et al.  Eye contact detection in humans from birth , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Alex C. Dornstauder,et al.  Averted body postures facilitate orienting of the eyes. , 2017, Acta psychologica.

[11]  Wojciech Matusik,et al.  Gaze360: Physically Unconstrained Gaze Estimation in the Wild , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Andrew P Bayliss,et al.  Orienting of attention via observed eye gaze is head-centred , 2004, Cognition.

[13]  Lauren E. Welbourne,et al.  Humans, but Not Deep Neural Networks, Often Miss Giant Targets in Scenes , 2017, Current Biology.

[14]  J. Hietanen,et al.  Social attention orienting integrates visual information from head and body orientation , 2002, Psychological research.

[15]  Alex Clarke,et al.  Integrated deep visual and semantic attractor neural networks predict fMRI pattern-information along the ventral object processing pathway , 2018, Scientific Reports.

[16]  Antonio Torralba,et al.  Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence , 2016, Scientific Reports.

[17]  Andrew D. A. Maidment,et al.  Under-exploration of Three-Dimensional Images Leads to Search Errors for Small Salient Targets , 2021, Current Biology.

[18]  Thomas Serre,et al.  Deep Learning: The Good, the Bad, and the Ugly. , 2019, Annual review of vision science.

[19]  Chaz Firestone,et al.  Performance vs. competence in human–machine comparisons , 2020, Proceedings of the National Academy of Sciences.

[20]  Eunji Chong,et al.  Detecting Attended Visual Targets in Video , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Mario Fritz,et al.  Appearance-based gaze estimation in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).