Text line extraction in free style document

This paper addresses to text line extraction in free style document, such as business card, envelope, poster, etc. In free style document, global property such as character size, line direction can hardly be concluded, which reveals a grave limitation in traditional layout analysis. 'Line' is the most prominent and the highest structure in our bottom-up method. First, we apply a novel intensity function found on gradient information to locate text areas where gradient within a window have large magnitude and various directions, and split such areas into text pieces. We build a probability model of lines consist of text pieces via statistics on training data. For an input image, we group text pieces to lines using a simulated annealing algorithm with cost function based on the probability model.

[1]  Satoshi Goto,et al.  A robust algorithm for text detection in color images , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[2]  Yuan Yan Tang,et al.  Document skew detection based on the fractal and least squares method , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[3]  Abderrazak Zahour,et al.  Arabic hand-written text-line extraction , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[4]  Fei Yin,et al.  Handwritten text line extraction based on minimum spanning tree clustering , 2007, 2007 International Conference on Wavelet Analysis and Pattern Recognition.

[5]  Xilin Chen,et al.  Automatic detection and recognition of signs from natural scenes , 2004, IEEE Transactions on Image Processing.

[6]  David Jones,et al.  Discerning structure from freeform handwritten notes , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..