Text line segmentation using Viterbi algorithm for the palm leaf manuscripts of Dai

The text line segmentation process is a key step in an optical character recognition (OCR) system. Several common approaches, such as projection-based methods and stochastic methods, have been put forward to fulfill this task. However, most of existing methods cannot be directly applied to process the palm leaf manuscripts of Dai which the images have poor quality and include smudges, creases, stroke deformation and character touching. To solve this problem, an improved Viterbi algorithm based on Hidden Markov Model (HMM) is proposed to find all possible segmentation paths firstly. And then, a path filtering method is used to detect the optimal paths for the segmented text blocks. The performance of the method is compared with relevant methods and the experimental results demonstrate the effectiveness of the proposed method.

[1]  Fatos T. Yarman-Vural,et al.  Repulsive attractive network for baseline extraction on document images , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Tien D. Bui,et al.  Text line segmentation in handwritten documents using Mumford-Shah model , 2009, Pattern Recognit..

[3]  References , 1971 .

[4]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[5]  Lance Chun Che Fung,et al.  Character segmentation from ancient palm leaf manuscripts in Thailand , 2011, HIP '11.

[6]  U. Pal,et al.  Segmentation of Bangla unconstrained handwritten text , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[7]  Lance Chun Che Fung,et al.  Text Line Extraction Using Adaptive Partial Projection for Palm Leaf Manuscripts from Thailand , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[8]  V. Govindaraju,et al.  Digital Enhancement of Palm Leaf Manuscript Images using Normalization Techniques , 2004 .

[9]  Basilios Gatos,et al.  Handwritten Text Line Segmentation by Shredding Text into its Lines , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[10]  Hsi-Jian Lee,et al.  Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm , 1999, Pattern Recognit. Lett..

[11]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[12]  Laurence Likforman-Sulem,et al.  Text line segmentation of historical documents: a survey , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[13]  Venu Govindaraju,et al.  Line separation for complex document images using fuzzy runlength , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[14]  Yi Lu,et al.  Machine printed character segmentation --; An overview , 1995, Pattern Recognit..