A skeleton based descriptor for detecting text in real scene images

In this paper, we present a new method for text extraction in real scene images. We propose first a skeleton based descriptor to describe the strokes of the text candidates that compose a spatial relation graph. We then apply the graph cuts algorithm to label the nodes of the graph as text or non-text. We finally refine the resulted text lines candidates by classifying them using a kernel SVM. To validate this approach we perform a set of tests on the public datasets ICDAR 2003 and 2011.

[1]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[2]  Jean-Michel Jolion,et al.  Object count/area graphs for the evaluation of object detection and segmentation algorithms , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[3]  L. S. Nelson,et al.  The Folded Normal Distribution , 1961 .

[4]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[5]  Rangachar Kasturi,et al.  Locating uniform-colored text in video frames , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[6]  Andreas Dengel,et al.  ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[7]  Ki-Sang Hong,et al.  Detection of curvilinear structures and reconstruction of their regions in gray-scale images , 2002, Pattern Recognit..

[8]  Ching Y. Suen,et al.  Text detection from scene images using sparse representation , 2008, 2008 19th International Conference on Pattern Recognition.

[9]  Matthieu Cord,et al.  Snoopertext: A multiresolution system for text detection in complex visual scenes , 2010, 2010 IEEE International Conference on Image Processing.

[10]  Huizhong Chen,et al.  Robust text detection in natural images with edge-enhanced Maximally Stable Extremal Regions , 2011, 2011 18th IEEE International Conference on Image Processing.

[11]  David G. Lowe,et al.  Shape Descriptors for Maximally Stable Extremal Regions , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Matthieu Cord,et al.  TEXT EXTRACTION FROM STREET LEVEL IMAGES , 2009 .

[13]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Chunheng Wang,et al.  Conditional random field for text segmentation from images with complex background , 2010, Pattern Recognit. Lett..

[15]  Ellen K. Hughes,et al.  Video OCR for digital news archive , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[16]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..