Text-line segmentation of large titles and headings in Arabic like script

Current approaches for text line segmentation often are either very specialized to specific domains or they depend on many parameters. More specifically, the extraction of text-lines with large sizes, i.e., headings and titles in the Arabic like script could not be segmented correctly by state-of-the-art methods. In this work, we present a simple and robust text-line segmentation approach. The proposed method is tested on real Pashto scanned images and it outperforms a recent text independent state of the art method with respect to performance and time.

[1]  Syed Saqib Bukhari,et al.  Script-Independent Handwritten Textlines Segmentation Using Active Contours , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[2]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[3]  J. W. Tukey,et al.  The Measurement of Power Spectra from the Point of View of Communications Engineering , 1958 .

[4]  Sargur N. Srihari,et al.  A statistical approach to line segmentation in handwritten documents , 2007, Electronic Imaging.

[5]  Ioannis Pratikakis,et al.  Text line detection in handwritten documents , 2008, Pattern Recognit..

[6]  Nam Ik Cho,et al.  Language-Independent Text-Line Extraction Algorithm for Handwritten Documents , 2014, IEEE Signal Processing Letters.

[7]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[8]  Santanu Chaudhury,et al.  Text recognition using deep BLSTM networks , 2015, 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR).

[9]  Syed Saqib Bukhari,et al.  Segmentation of Curled Textlines Using Active Contours , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[10]  Marcus Liwicki,et al.  Scale and rotation invariant OCR for Pashto cursive script using MDLSTM network , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[11]  Marcus Liwicki,et al.  Recognizable units in Pashto language for OCR , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[12]  Marcus Liwicki,et al.  KPTI: Katib's Pashto Text Imagebase and Deep Learning Benchmark , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[13]  Nam Ik Cho,et al.  Text-Line Extraction in Handwritten Chinese Documents Based on an Energy Minimization Framework , 2012, IEEE Transactions on Image Processing.