Fast script word recognition with very large vocabulary

For an HMM-based script word recognition system an algorithm for fast processing of large lexica is presented. It consists of two steps: First, a lexicon-free recognition is performed, followed by a tree search on the intermediate results of the first step, the trellis of probabilities. Thus, the computational effort for recognition itself can be reduced in the first step, while preserving recognition accuracy by the use of detailed information in the second step. A speedup factor of up to 15/spl times/ could be obtained compared to traditional tree recognition, making script word recognition with large lexica available to time-critical tasks like in postal automation. There, lexica with e.g. all city or street names (20-500 k) have to be processed within a few milliseconds.

[1]  Christoph Neukirchen,et al.  DUcoder-the Duisburg University LVCSR stackdecoder , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Marc-Peter Schambach A new view of the output from word recognition , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[3]  Torsten Caesar,et al.  Preprocessing and feature extraction for a handwriting recognition system , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[4]  Torsten Caesar,et al.  Sophisticated topology of hidden Markov models for cursive script recognition , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[5]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[6]  Gerhard Rigoll,et al.  A comparison of character n-grams and dictionaries used for script recognition , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[7]  Marc-Peter Schambach Automatische Modellierung gebundener Handschrift in einem HMM-basierten Erkennungssystem , 2004 .