Extraction of read text using a wearable eye tracker for automatic video annotation

This paper presents an automatic video annotation method which utilizes the user's reading behaviour. Using a wearable eye tracker, we identify the video frames where the user reads a text document and extract the sentences that have been read by him or her. The extracted sentences are used to annotate video segments which are taken from the user's egocentric perspective. An advantage of the proposed method is that we do not require training data, which is often used by a video annotation method. We examined the accuracy of the proposed annotation method with a pilot study where the experiment participants drew an illustration reading a tutorial. The method achieved 64.5% recall and 30.8% precision.

[1]  Hisashi Miyamori,et al.  Video annotation for content-based retrieval using human behavior analysis and domain knowledge , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[2]  Atsuo Yoshitaka,et al.  Image/video indexing, retrieval and summarization based on eye movement , 2011 .

[3]  Thomas Kieninger,et al.  Gaze guided object recognition using a head-mounted eye tracker , 2012, ETRA '12.

[4]  Andrew T Duchowski,et al.  A breadth-first survey of eye-tracking applications , 2002, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[5]  Gerhard Tröster,et al.  Robust Recognition of Reading Activity in Transit Using Wearable Electrooculography , 2009, Pervasive.

[6]  Ana Cristina Murillo,et al.  Experiments on an RGB-D Wearable Vision System for Egocentric Activity Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[7]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[8]  Andreas Bulling,et al.  Pupil: an open source platform for pervasive eye tracking and mobile gaze-based interaction , 2014, UbiComp Adjunct.

[9]  Kate Saenko,et al.  Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild , 2014, COLING.

[10]  K. O’Hara,et al.  Lifelogging: Privacy and empowerment with memories for life , 2008 .

[11]  Kai Kunze,et al.  Towards inferring language expertise using eye tracking , 2013, CHI Extended Abstracts.

[12]  Hervé Glotin,et al.  IRIM at TRECVID 2014: Semantic Indexing and Instance Search , 2014, TRECVID.