Can relevance of images be inferred from eye movements?

Query formulation and efficient navigation through data to reach relevant results are undoubtedly major challenges for image or video retrieval. Queries of good quality are typically not available and the search process needs to rely on relevance feedback given by the user, which makes the search process iterative. Giving explicit relevance feedback is laborious, not always easy, and may even be impossible in ubiquitous computing scenarios. A central question then is: Is it possible to replace or complement scarce explicit feedback with implicit feedback inferred from various sensors not specifically designed for the task? In this paper, we present preliminary results on inferring the relevance of images based on implicit feedback about users' attention, measured using an eye tracking device. It is shown that, in reasonably controlled setups at least, already fairly simple features and classifiers are capable of detecting the relevance based on eye movements alone, without using any explicit feedback.

[1]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[2]  Samuel Kaski,et al.  Can Relevance be Inferred from Eye Movements in Information Retrieval , 2003 .

[3]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[4]  Gabriela Csurka,et al.  XRCE's Participation to ImageCLEFphoto 2007 , 2007, CLEF.

[5]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[6]  Paul P. Maglio,et al.  SUITOR: an attentive information system , 2000, IUI '00.

[7]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[8]  John Shawe-Taylor,et al.  Information Retrieval by Inferring Implicit Queries from Eye Movements , 2007, AISTATS.

[9]  Thorsten Joachims,et al.  Eye-tracking analysis of user behavior in WWW search , 2004, SIGIR '04.

[10]  Edward Cutrell,et al.  What are you looking for?: an eye-tracking study of information usage in web search , 2007, CHI.

[11]  Erkki Oja,et al.  PicSOM-self-organizing image retrieval with MPEG-7 content descriptors , 2002, IEEE Trans. Neural Networks.

[12]  Cordelia Schmid,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[13]  Jaime Teevan,et al.  Implicit feedback for inferring user preference: a bibliography , 2003, SIGF.

[14]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.