An Efficient Skewed Line Segmentation Technique for Cursive Script OCR

Segmentation of cursive text remains the challenging phase in the recognition of text. In OCR systems, the recognition accuracy of text is directly dependent on the quality of segmentation. In cursive text OCR systems, the segmentation of handwritten Urdu language text is a complex task because of the context sensitivity and diagonality of the text. This paper presents a line segmentation algorithm for Urdu handwritten and printed text and subsequently to ligatures. In the proposed technique, the counting pixel approach is employed for modified header and baseline detection, in which the system first removes the skewness of the text page, and then the page is converted into lines and ligatures. The algorithm is evaluated on manually generated Urdu printed and handwritten dataset. The proposed algorithm is tested separately on handwritten and printed text, showing 96.7% and 98.3% line accuracy, respectively. Furthermore, the proposed line segmentation algorithm correctly extracts the lines when tested on Arabic text.

[1]  Syed Saqib Bukhari,et al.  High Performance Layout Analysis of Arabic and Urdu Document Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[2]  Amit Choudhary,et al.  A Robust Technique for Handwritten Words Segmentation into Individual Characters , 2018 .

[3]  Xiaojie Wang,et al.  Line and Ligature Segmentation of Urdu Nastaleeq Text , 2017, IEEE Access.

[4]  Darko Brodic,et al.  A New Approach to Water Flow Algorithm for Text Line Segmentation , 2011, J. Univers. Comput. Sci..

[5]  Lambert Schomaker,et al.  A Path Planning for Line Segmentation of Handwritten Documents , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[6]  S. Banerjee,et al.  An efficient line segmentation approach for handwritten Bangla document image , 2015, 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR).

[7]  Sos S. Agaian,et al.  A Robust Line Segmentation Algorithm for Arabic Printed Text with Diacritics , 2017, Image Processing: Algorithms and Systems.

[8]  Yousfi Abdellah,et al.  Segmentation of Arabic Handwritten Text to Lines , 2015 .

[9]  Amjad Rehman An Ensemble of Neural Networks for Non-Linear Segmentation of Overlapped Cursive Script , 2019, ArXiv.

[10]  Sunanda Dixit,et al.  Recognition of Handwritten English Text U Minimisation , 2016 .

[11]  Paolo Nesi,et al.  Projection based segmentation of musical sheets , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[12]  Tanbhir Hoq,et al.  Micro hydro power: promising solution for off-grid renewable energy source , 2011 .

[13]  Umapada Pal,et al.  Baseline detection of multi-lingual unconstrained handwritten text lines , 2016, Pattern Recognit. Lett..

[14]  F. Shafait,et al.  Layout Analysis of Urdu Document Images , 2006, 2006 IEEE International Multitopic Conference.

[15]  Christian Wolf,et al.  Paragraph text segmentation into lines with Recurrent Neural Networks , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[16]  Haiyan Li,et al.  Text line segmentation using Viterbi algorithm for the palm leaf manuscripts of Dai , 2016, 2016 International Conference on Audio, Language and Image Processing (ICALIP).

[17]  Darko Brodić,et al.  Text Line Segmentation With Water Flow Algorithm Based on Power Function , 2015 .

[18]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[19]  Its'hak Dinstein,et al.  2009 10th International Conference on Document Analysis and Recognition Line segmentation for degraded handwritten historical documents , 2022 .