A Markov chain based line segmentation framework for handwritten character recognition

In this paper, we present a novel text line segmentation framework following the divide-and-conquer paradigm: we iteratively identify and re-process regions of ambiguous line segmentation from an input document image until there is no ambiguity. To detect ambiguous line segmentation, we introduce the use of two complimentary line descriptors, referred as to the underline and highlight line descriptors, and identify ambiguities when their patterns mismatch. As a result, we can easily identify already good line segmentations, and largely simplify the original line segmentation problem by only reprocessing ambiguous regions. We evaluate the performance of the proposed line segmentation framework using the ICDAR 2009 handwritten document dataset, and it is close to top-performing systems submitted to the competition. Moreover, the proposed method is also robust against skewness, noise, variable line heights and touching characters. The proposed idea can also be applied to other text analysis tasks such as word segmentation and page layout analysis.

[1]  Rohit Prasad,et al.  Graph Clustering-Based Ensemble Method for Handwritten Text Line Segmentation , 2011, 2011 International Conference on Document Analysis and Recognition.

[2]  Yue Wu,et al.  James–Stein Type Center Pixel Weights for Non-Local Means Image Denoising , 2012, IEEE Signal Processing Letters.

[3]  Alireza Alaei,et al.  A new scheme for unconstrained handwritten text-line segmentation , 2011, Pattern Recognit..

[4]  Fei Yin,et al.  Handwritten Chinese text line segmentation by clustering with distance metric learning , 2009, Pattern Recognit..

[5]  Nikos Fakotakis,et al.  An unconstrained handwriting recognition system , 2002, International Journal on Document Analysis and Recognition.

[6]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..

[7]  Venu Govindaraju,et al.  Nested state indexing in pairwise Markov networks for fast handwritten document image rule-line removal , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[8]  U. Pal,et al.  Segmentation of Bangla unconstrained handwritten text , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[9]  Vassilis Katsouros,et al.  Robust text-line and word segmentation for handwritten documents images , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Yue Wu,et al.  Probabilistic Non-Local Means , 2013, IEEE Signal Processing Letters.

[11]  Georgi Gluhchev,et al.  Handwritten document image segmentation and analysis , 1993, Pattern Recognit. Lett..

[12]  Venu Govindaraju,et al.  2009 10th International Conference on Document Analysis and Recognition A Steerable Directional Local Profile Technique for Extraction of Handwritten Arabic Text Lines , 2022 .

[13]  Xujun Peng,et al.  Extracting information from handwritten content in census forms , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[14]  Basilios Gatos,et al.  Handwriting Segmentation Contest , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[15]  Nikos Fakotakis,et al.  An Integrated System for Handwritten Document Image Processing , 2003, Int. J. Pattern Recognit. Artif. Intell..

[16]  Xujun Peng,et al.  Text Extraction from Video Using Conditional Random Fields , 2011, 2011 International Conference on Document Analysis and Recognition.

[17]  Rohit Prasad,et al.  Robust Page Segmentation Based on Smearing and Error Correction Unifying Top-down and Bottom-up Approaches , 2007 .

[18]  Venu Govindaraju,et al.  Text extraction from gray scale historical document images using adaptive local connectivity map , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[19]  Frank Lebourgeois,et al.  Networking digital document images , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[20]  Horst Bunke,et al.  On the influence of vocabulary size and language models in unconstrained handwritten text recognition , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[21]  Sos S. Agaian,et al.  Image encryption using the Sudoku matrix , 2010, Defense + Commercial Sensing.

[22]  Laurence Likforman-Sulem,et al.  Text line segmentation of historical documents: a survey , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[23]  Carlos Guedes,et al.  A connected path approach for staff detection on a music score , 2008, 2008 15th IEEE International Conference on Image Processing.

[24]  Alicia Fornés,et al.  On Influence of Line Segmentation in Efficient Word Segmentation in Old Manuscripts , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.