Identifying Objects in Images from Analyzing the Users' Gaze Movements for Provided Tags

Assuming that eye tracking will be a common input device in the near future in notebooks and mobile devices like iPads, it is possible to implicitly gain information about images and image regions from these users' gaze movements. In this paper, we investigate the principle idea of finding specific objects shown in images by looking at the users' gaze path information only. We have analyzed 547 gaze paths from 20 subjects viewing different image-tag-pairs with the task to decide if the tag presented is actually found in the image or not. By analyzing the gaze paths, we are able to correctly identify 67% of the image regions and significantly outperform two baselines. In addition, we have investigated if different regions of the same image can be differentiated by the gaze information. Here, we are able to correctly identify two different regions in the same image with an accuracy of 38%.

[1]  Manuel Blum,et al.  Peekaboom: a game for locating objects in images , 2006, CHI.

[2]  S. Shimojo,et al.  Gaze bias both reflects and influences preference , 2003, Nature Neuroscience.

[3]  Ebroul Izquierdo,et al.  Gaze movement inference for implicit image annotation , 2010, 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10.

[4]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[5]  Pearl Pu,et al.  Eye-tracking product recommenders' usage , 2010, RecSys '10.

[6]  Neil C. Rowe Finding and Labeling the Subject of a Captioned Depictive Natural Photograph , 2002, IEEE Trans. Knowl. Data Eng..

[7]  Steffen Staab,et al.  Towards Improving the Understanding of Image Semantics by Gaze-based Tag-to-Region Assignments , 2011 .

[8]  Samuel Kaski,et al.  Can relevance of images be inferred from eye movements? , 2008, MIR '08.

[9]  David Salesin,et al.  Gaze-based interaction for semi-automatic photo cropping , 2006, CHI.

[10]  Samuel Kaski,et al.  GaZIR: gaze-based zooming interface for image retrieval , 2009, ICMI-MLMI '09.

[11]  Arto Klami,et al.  Inferring task-relevant image regions from gaze data , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[12]  Tat-Seng Chua,et al.  Automated localization of affective objects and actions in images via caption text-cum-eye gaze analysis , 2009, ACM Multimedia.

[13]  Shih-Fu Chang,et al.  Using human observer eye movements in automatic image classifiers , 2001, IS&T/SPIE Electronic Imaging.