论文信息 - Improvements in hidden Markov model based Arabic OCR

Improvements in hidden Markov model based Arabic OCR

This paper describes recent advances in hidden Markov model (HMM) based OCR for machine-printed arabic documents. A combination of script-independent and script-specific techniques are applied to glyph models and language models (LM). Script-independent techniques we applied are higher order n-gram LMs for N-best rescoring and discriminative estimation of glyph HMMs. Arabic specific techniques include the use of context-dependent HMMs for glyph modeling and Parts-of-Arabic-Words in language modeling. We present experimental results that demonstrate a 40% relative reduction in word error rate over the baseline configuration on a corpus of machine-printed Arabic documents.

[1] Samy Bengio,et al. Offline recognition of unconstrained handwritten texts using HMMs and statistical language models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Rohit Prasad,et al. Multi-lingual Offline Handwriting Recognition Using Hidden Markov Models: A Script-Independent Approach , 2006, SACH.

[3] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4] Richard M. Schwartz,et al. Multilingual Machine Printed OCR , 2001, Int. J. Pattern Recognit. Artif. Intell..

[5] Richard M. Schwartz,et al. A Script-Independent Methodology For Optical Character Recognition , 1998, Pattern Recognit..

[6] Peter Burrow,et al. Arabic Handwriting Recognition , 2004 .

[7] Daniel Povey,et al. Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.