A Hybrid Approach for Line Segmentation in Handwritten Documents

This paper presents an approach for text line segmentation which combines connected component based and projection based information to take advantage of aspects of both methods. The proposed system finds baselines of each connected component. Lines are detected by grouping baselines of connected components belonging to each line by projection information. Components are assigned to lines according to different distance metrics with respect to their size. This study is one of the rare studies that apply line segmentation to Ottoman documents. Further, it proposes a new method, Fourier curve fitting, to detect the peaks in a projection profile. The algorithm is demonstrated on different printed and handwritten Ottoman datasets. Results show that the method manages to segment lines both from printed and handwritten documents under different writing conditions at least with 92% accuracy.

[1]  Pinar Duygulu Sahin,et al.  Retrieval of Ottoman documents , 2006, MIR '06.

[2]  Laurence Likforman-Sulem,et al.  Text Line Segmentation of Historical Arabic Documents , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[3]  Yi Lu,et al.  Machine printed character segmentation --; An overview , 1995, Pattern Recognit..

[4]  Kevin Chen,et al.  DOCLIB: a software library for document processing , 2006, Electronic Imaging.

[5]  J. Joseph,et al.  Fourier transforms , 2012 .

[6]  Alireza Alaei,et al.  A new scheme for unconstrained handwritten text-line segmentation , 2011, Pattern Recognit..

[7]  Tien D. Bui,et al.  Text line segmentation in handwritten documents using Mumford-Shah model , 2009, Pattern Recognit..

[8]  Laurence Likforman-Sulem,et al.  Text line segmentation of historical documents: a survey , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[9]  Syed Saqib Bukhari,et al.  Script-Independent Handwritten Textlines Segmentation Using Active Contours , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[10]  Laurence Likforman-Sulem,et al.  Overlapping and multi-touching text-line segmentation by Block Covering analysis , 2008, Pattern Analysis and Applications.

[11]  Jayant Kumar,et al.  Handwritten Arabic text line segmentation using affinity propagation , 2010, DAS '10.

[12]  Abdel Belaïd,et al.  Multi-oriented Text Line Extraction from Handwritten Arabic Documents , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[13]  Fatos T. Yarman-Vural,et al.  Repulsive attractive network for baseline extraction on document images , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Umapada Pal,et al.  Handwriting segmentation of unconstrained Oriya text , 2006 .

[15]  Yi Li,et al.  Script-Independent Text Line Segmentation in Freestyle Handwritten Documents , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  A. Peter Johnson,et al.  A Fast Algorithm for Bottom-Up Document Layout Analysis , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Jihad El-Sana,et al.  Language-Independent Text Lines Extraction Using Seam Carving , 2011, 2011 International Conference on Document Analysis and Recognition.