Recognizing Words in Scenes with a Head-Mounted Eye-Tracker

Recognition of scene text using a hand-held camera is emerging as a hot topic of research. In this paper, we investigate the use of a head-mounted eye-tracker for scene text recognition. An eye-tracker detects the position of the user's gaze. Using gaze information of the user, we can provide the user with more information about his region/object of interest in a ubiquitous manner. Therefore, we can realize a service such as the user gazes at a certain word and soon obtain the related information of the word by combining a word recognition system with eye-tracking technology. Such a service is useful since the user has to do nothing but gazes at interested words. With a view to realize the service, we experimentally evaluate the effectiveness of using the eye-tracker for word recognition. The initial results show the recognition accuracy was around 70% in our word recognition experiment and the average computational time was less than one second per a query image.

[1]  Thomas Kieninger,et al.  Museum Guide 2.0 – An Eye-Tracking based Personal Assistant for Museums and Exhibits , 2011 .

[2]  Yangsheng Xu,et al.  A Wearable Translation Robot , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[3]  Kise Koichi,et al.  Fast Approximate Nearest Neighbor Search Based on Improved Approximate Distance , 2011 .

[4]  Takuya Kobayashi,et al.  Recognition of Multiple Characters in a Scene Image Using Arrangement of Local Features , 2011, 2011 International Conference on Document Analysis and Recognition.

[5]  Jiri Matas,et al.  A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[6]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Masakazu Iwamura,et al.  Real-life clickable text , 2010 .

[8]  Jean-Michel Morel,et al.  ASIFT: A New Framework for Fully Affine Invariant Image Comparison , 2009, SIAM J. Imaging Sci..

[9]  Yasuhiko Watanabe,et al.  Translation camera , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[10]  Masakazu Iwamura,et al.  Memory-based recognition of camera-captured characters , 2010, DAS '10.

[11]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[12]  Majid Mirmehdi,et al.  A Head-Mounted Device for Recognizing Text in Natural Scenes , 2011, CBDAR.

[13]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..