论文信息 - Unlimited Vocabulary Script Recognition Using Character N-Grams

Unlimited Vocabulary Script Recognition Using Character N-Grams

In this paper a robust Script recognition system is described, which makes use of a language model, that consists of backoff character n-grams. The system is based on Hidden Markov Models (HMMs) using discrete and hybrid modeling techniques, where the latter depends on a vector quantizer trained according to the MMI-criterion (information theory-based neural network). The presented recognition results refer to the SEDAL-database of degraded English documents such as photocopy or fax using no dictionary and a writer-dependent handwritten database of cursive German Script samples. Our resulting system for character recognition yields significantly better recognition results for an unlimited vocabulary using language models.

Gerhard Rigoll | Anja Brakensiek | Daniel Willett

[1] Gerhard Rigoll,et al. A new hybrid approach to large vocabulary cursive handwriting recognition , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[2] John Illingworth,et al. The advantage of using an HMM-based approach for faxed word recognition , 1998, International Journal on Document Analysis and Recognition.

[3] Joachim M. Gloger,et al. A comparison of Gaussian distribution and polynomial classifiers in a hidden Markov model based system for the recognition of cursive script , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[4] Christoph Neukirchen,et al. DUcoder-the Duisburg University LVCSR stackdecoder , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5] Gerhard Rigoll,et al. Vergleich verschiedener statistischer Modellierungsverfahren für die On- und Off-line Handschriftenerkennung , 1999, DAGM-Symposium.

[6] Ronald Rosenfeld,et al. Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[7] Volker Märgner,et al. Script recognition using inhomogeneous P2DHMM and hierarchical search space reduction , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[8] Torsten Caesar,et al. Preprocessing and feature extraction for a handwriting recognition system , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[9] Marwan A. Jabri,et al. Low resolution, degraded document recognition using neural networks and hidden Markov models , 1998, Pattern Recognit. Lett..

[10] Richard M. Schwartz,et al. An Omnifont Open-Vocabulary OCR System for English and Arabic , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[11] L. Rabiner,et al. An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[12] H. Niemann,et al. A HMM–based System for Recognition of Handwritten Address Words , 1999 .