Knowledge-Based Baseline Detection and Optimal Thresholding for Words Segmentation in Efficient Pre-Processing of Handwritten Arabic Text

Techniques on detecting baseline and segmenting words in handwritten Arabic text are presented in this paper. Instead of using pure projection, knowledge of the location of the baseline is utilized for accurate baseline detection. Then, distances between words and subwords are respectively analyzed, and their statistical distributions are obtained to decide an optimal threshold in segmenting words. Results on IFN/ENIT database have validated our methods in terms of improved baseline detection and words segmentation for further recognition.

[1]  Adnan Amin,et al.  Hand printed Arabic character recognition system , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[2]  Mohammad S. Khorsheed,et al.  Off-Line Arabic Character Recognition – A Review , 2002, Pattern Analysis & Applications.

[3]  Venu Govindaraju,et al.  Pre-processing methods for handwritten Arabic documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[4]  Venu Govindaraju,et al.  Segmentation and pre-recognition of Arabic handwriting , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[5]  Dave Elliman,et al.  Off-line recognition of handwritten Arabic words using multiple hidden Markov models , 2004, Knowl. Based Syst..

[6]  Murray J. J. Holt,et al.  Recognition of Off-Line Cursive Handwriting , 1998, Comput. Vis. Image Underst..

[7]  Venu Govindaraju,et al.  Offline Arabic handwriting recognition: a survey , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Volker Märgner,et al.  Baseline estimation for Arabic handwritten words , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[9]  Adnan Amin,et al.  Off-line Arabic character recognition: the state of the art , 1998, Pattern Recognit..