Benefiting from users’ gaze: selection of image regions from eye tracking information for provided tags

Providing image annotations is a tedious task. This becomes even more cumbersome when objects shall be annotated in the images. Such region-based annotations can be used in various ways like similarity search or as training set in automatic object detection. We investigate the principle idea of finding objects in images by looking at gaze paths from users, viewing images with an interest in a specific object. We have analyzed 799 gaze paths from 30 subjects viewing image-tag-pairs with the task to decide whether a tag could be found in the image or not. We have compared 13 different fixation measures analyzing the gaze paths. The best performing fixation measure is able to correctly assign a tag to a region for 63 % of the image-tag-pairs and significantly outperforms three baselines. We look into details of the image region characteristics such as the position and size for incorrect and correct assignments. The influence of aggregating multiple gaze paths from several subjects with respect to improving the precision of identifying the correct regions is also investigated. In addition, we look into the possibilities of discriminating different regions in the same image. Here, we are able to correctly identify two regions in the same image from different primings with an accuracy of 38 %.

[1]  Arto Klami,et al.  Inferring task-relevant image regions from gaze data , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[2]  Hai Jin,et al.  Label to region by bi-layer sparsity priors , 2009, MM '09.

[3]  Deok-Hwan Kim,et al.  A new region filtering and region weighting approach to relevance feedback in content-based image retrieval , 2008, J. Syst. Softw..

[4]  Harish Katti,et al.  An Eye Fixation Database for Saliency Detection in Images , 2010, ECCV.

[5]  Christof Koch,et al.  Learning a saliency map using fixated locations in natural scenes. , 2011, Journal of vision.

[6]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[7]  Ma Sasse,et al.  The Eyes Never Lie: The Use of Eyetracking Data in HCI Research , 2002, CHI 2002.

[8]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[9]  Tat-Seng Chua,et al.  Automated localization of affective objects and actions in images via caption text-cum-eye gaze analysis , 2009, ACM Multimedia.

[10]  Steffen Staab,et al.  Can You See It? Two Novel Eye-Tracking-Based Measures for Assigning Tags to Image Regions , 2013, MMM.

[11]  Oleg V. Komogortsev,et al.  Real-time eye gaze tracking with an unmodified commodity webcam employing a neural network , 2010, CHI Extended Abstracts.

[12]  Michael G. Strintzis,et al.  A World Wide Web region-based image search engine , 2001, Proceedings 11th International Conference on Image Analysis and Processing.

[13]  Claudio M. Privitera,et al.  Algorithms for Defining Visual Regions-of-Interest: Comparison with Eye Fixations , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Samuel Kaski,et al.  Can relevance of images be inferred from eye movements? , 2008, MIR '08.

[15]  Samuel Kaski,et al.  GaZIR: gaze-based zooming interface for image retrieval , 2009, ICMI-MLMI '09.

[16]  Long Zhu,et al.  Max Margin Learning of Hierarchical Configural Deformable Templates (HCDTs) for Efficient Object Parsing and Pose Estimation , 2011, International Journal of Computer Vision.

[17]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[18]  Ebroul Izquierdo,et al.  Gaze movement inference for implicit image annotation , 2010, 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10.

[19]  Shih-Fu Chang,et al.  Using human observer eye movements in automatic image classifiers , 2001, IS&T/SPIE Electronic Imaging.

[20]  L. Itti,et al.  Modeling the influence of task on attention , 2005, Vision Research.

[21]  Luc Van Gool,et al.  What makes a chair a chair? , 2011, CVPR 2011.

[22]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[23]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  A. L. I︠A︡rbus Eye Movements and Vision , 1967 .

[25]  Neil C. Rowe Finding and Labeling the Subject of a Captioned Depictive Natural Photograph , 2002, IEEE Trans. Knowl. Data Eng..

[26]  Marcel Worring,et al.  Annotating images by harnessing worldwide user-tagged photos , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Patrick J. Flynn,et al.  A Survey Of Free-Form Object Representation and Recognition Techniques , 2001, Comput. Vis. Image Underst..

[28]  Susanne Boll,et al.  Paving the Last Mile for Multi-Channel Multimedia Presentation Generation , 2005, 11th International Multimedia Modelling Conference.

[29]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[30]  Yi Liu,et al.  Large-scale image annotation using visual synset , 2011, 2011 International Conference on Computer Vision.

[31]  David Salesin,et al.  Gaze-based interaction for semi-automatic photo cropping , 2006, CHI.

[32]  Steffen Staab,et al.  Identifying Objects in Images from Analyzing the Users' Gaze Movements for Provided Tags , 2012, MMM.

[33]  Manuel Blum,et al.  Peekaboom: a game for locating objects in images , 2006, CHI.

[34]  Kitsuchart Pasupa,et al.  Learning to rank images from eye movements , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[35]  Shuicheng Yan,et al.  Inferring semantic concepts from community-contributed images and noisy tags , 2009, ACM Multimedia.

[36]  Pearl Pu,et al.  Eye-tracking product recommenders' usage , 2010, RecSys '10.

[37]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.