Recognition of Multi-oriented, Multi-sized, and Curved Text

Text recognition is difficult from documents that contain multi-oriented, curved text lines of various character sizes. This is because layout analysis techniques, which most optical character recognition (OCR) approaches rely on, do not work well on unstructured documents with non-homogeneous text. Previous work on recognizing non-homogeneous text typically handles specific cases, such as horizontal and/or straight text lines and single-sized characters. In this paper, we present a general text recognition technique to handle non-homogeneous text by exploiting dynamic character grouping criteria based on the character sizes and maximum desired string curvature. This technique can be easily integrated with classic OCR approaches to recognize non-homogeneous text. In our experiments, we compared our approach to a commercial OCR product using a variety of raster maps that contain multi-oriented, curved and straight text labels of multi-sized characters. Our evaluation showed that our approach produced accurate text recognition results and outperformed the commercial product at both the word and character level accuracy.

[1]  Joachim Pouderoux,et al.  Toponym Recognition in Scanned Color Topographic Maps , 2007 .

[2]  Craig A. Knoblock,et al.  Harvesting geographic features from heterogeneous raster maps , 2010 .

[3]  Hervé Le Men,et al.  Character string recognition on maps, a rotation-invariant recognition method , 1995, Pattern Recognit. Lett..

[4]  Bidyut Baran Chaudhuri,et al.  Multi-oriented English Text Line Identification , 2003, SCIA.

[5]  Umapada Pal,et al.  Multi-Oriented and Multi-Sized Touching Character Segmentation Using Dynamic Programming , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[6]  Aria Pezeshk,et al.  Extended character defect model for recognition of text from maps , 2010, 2010 IEEE Southwest Symposium on Image Analysis & Interpretation (SSIAI).

[7]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Jacques Labiche,et al.  Symbol and character recognition: application to engineering drawings , 2000, International Journal on Document Analysis and Recognition.

[9]  K. Kalafalay,et al.  Reading Street Names from Maps { Technical Challenges , 1997 .

[10]  Fumitaka Kimura,et al.  Multi-Oriented English Text Line Extraction Using Background and Foreground Information , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[11]  Serguei Levachkine,et al.  Text/Graphics Separation and Recognition in Raster-Scanned Color Cartographic Maps , 2003, GREC.

[12]  Craig A. Knoblock,et al.  An Approach for Recognizing Text Labels in Raster Maps , 2010, 2010 20th International Conference on Pattern Recognition.

[13]  Hirotomo Aso,et al.  Extracting curved text lines using local linearity of the text line , 1999, International Journal on Document Analysis and Recognition.