A Real-Time Scene Text to Speech System

An end-to-end real-time scene text localization and recognition method is demonstrated. The method localizes textual content in images, a video or a webcam stream, performs character recognition (OCR) and "reads" it out loud using a text-to-speech engine. The method has been recently published, achieves state-of-the-art results on public datasets and is able to recognize different fonts and scripts including non-latin ones. The real-time performance is achieved by posing the character detection problem as an efficient sequential selection from the set of Extremal Regions (ERs) which has a linear computation complexity in the number of pixels in the image. Robustness to blur, noise and illumination and color variations is also demonstrated. Finally, we show effects of various control parameters.

[1]  Christof Koch,et al.  AdaBoost for Text Detection in Natural Scene , 2011, 2011 International Conference on Document Analysis and Recognition.

[2]  Andreas Dengel,et al.  ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[3]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, CVPR 2004.

[4]  Radim Sára,et al.  A Weak Structure Model for Regular Pattern Recognition Applied to Facade Images , 2010, ACCV.

[5]  Jing Zhang,et al.  Character Energy and Link Energy-Based Text Extraction in Scene Images , 2010, ACCV.

[6]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[7]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Jiri Matas,et al.  A New Class of Learnable Detectors for Categorisation , 2005, SCIA.

[9]  Jiřı́ Matas,et al.  Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[11]  Cheng-Lin Liu,et al.  Text Localization in Natural Scene Images Based on Conditional Random Field , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[12]  Jiri Matas,et al.  Text Localization in Real-World Images Using Efficiently Pruned Exhaustive Search , 2011, 2011 International Conference on Document Analysis and Recognition.

[13]  Jiri Matas,et al.  A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[14]  Rainer Lienhart,et al.  Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..