Handwritten and Typewritten Text Identification and Recognition Using Hidden Markov Models

We present a system for identification and recognition of handwritten and typewritten text from document images using hidden Markov models (HMMs) in this paper. Our text type identification uses OCR decoding to generate word boundaries followed by word-level handwritten/typewritten identification using HMMs. We show that the contextual constraints from the HMM significantly improves the identification performance over the conventional Gaussian mixture model (GMM)-based method. Type identification is then used to estimate the frame sample rates and frame width of feature sequences for HMM OCR system for each type independently. This type-dependent approach to computing the frame sample rate and frame width shows significant improvement in OCR accuracy over type-independent approaches.

[1]  Horst Bunke,et al.  Off-line writer identification using Gaussian mixture models , 2006 .

[2]  David S. Doermann,et al.  Machine printed text and handwriting identification in noisy document images , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Umapada Pal,et al.  Structural handwritten and machine print classification for sparse content and arbitrary oriented document fragments , 2010, SAC '10.

[4]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[5]  Jinhong Katherine Guo,et al.  Separating handwritten material from machine printed text using hidden Markov models , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[6]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[7]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[8]  Rohit Prasad,et al.  Improvements in BBN's HMM-Based Offline Arabic Handwriting Recognition System , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[9]  Josep Lladós,et al.  Unsupervised writer style adaptation for handwritten word spotting , 2008, 2008 19th International Conference on Pattern Recognition.

[10]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[11]  Gernot A. Fink,et al.  Markov Models for Pattern Recognition: From Theory to Applications , 2007 .

[12]  K. R. Arvind,et al.  A Robust Two Level Classification Algorithm for Text Localization in Documents , 2007, ISVC.