Using text-spotting to query the world

The world we live in is labeled extensively for the benefit of humans. Yet, to date, robots have made little use of human readable text as a resource. In this paper we aim to draw attention to text as a readily available source of semantic information in robotics by implementing a system which allows robots to read visible text in natural scene images and to use this knowledge to interpret the content of a given scene. The reliable detection and parsing of text in natural scene images is an active area of research and remains a non-trivial problem. We extend a commonly adopted approach based on boosting for the detection and optical character recognition (OCR) for the parsing of text by a probabilistic error correction scheme incorporating a sensor-model for our pipeline. In order to interpret the scene content we introduce a generative model which explains spotted text in terms of arbitrary search terms. This allows the robot to estimate the relevance of a given scene with respect to arbitrary queries such as, for example, whether it is looking at a bank or a restaurant. We present results from images recorded by a robot in a busy cityscape.

[1]  Jeremy H. Clear,et al.  The British national corpus , 1993 .

[2]  E.L. Schwartz,et al.  Space-variant active vision and visually guided robotics: design and construction of a high-performance miniature vehicle , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[3]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[4]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[5]  Majid Mirmehdi,et al.  A non-contact method of capturing low-resolution text for OCR , 2003, Pattern Analysis & Applications.

[6]  Pedro J. Sanz,et al.  An autonomous assistant robot for book manipulation in a library , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[7]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[8]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, CVPR 2004.

[9]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[10]  Anil K. Jain,et al.  Text information extraction in images and video: a survey , 2004, Pattern Recognit..

[11]  David J. Kriegman,et al.  Video-based Car Surveillance: License Plate, Make, and Model Recognition , 2005 .

[12]  J. Samarabandu,et al.  An edge-based text region extraction algorithm for indoor mobile robot navigation , 2005, IEEE International Conference Mechatronics and Automation, 2005.

[13]  S.M. Lucas,et al.  ICDAR 2005 text locating competition results , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[14]  Alberto Finzi,et al.  Augmenting situation awareness via model-based control in rescue robots , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Alan L. Yuille,et al.  A Time-Efficient Cascade for Real-Time Object Detection: With applications for the visually impaired , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[16]  Langis Gagnon,et al.  Key-text spotting in documentary videos using Adaboost , 2006, Electronic Imaging.

[17]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[18]  Lionel Prevost,et al.  A cascade detector for text detection in natural scene images , 2008, 2008 19th International Conference on Pattern Recognition.

[19]  Allen R. Hanson,et al.  Unified detection and recognition for reading text in scene images , 2008 .

[20]  Nadav Ben-Haim Task specific image text recognition , 2008 .

[21]  Jordi Vitrià,et al.  Text Detection in Urban Scenes , 2009, CCIA.

[22]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Roland Siegwart,et al.  Inferring the semantics of direction signs in public places , 2010, 2010 IEEE International Conference on Robotics and Automation.