Advanced state clustering for very large vocabulary HMM-based on-line handwriting recognition

The paper presents some novel methods for the introduction of context dependent hidden Markov models (HMM) to online handwriting recognition. The use of these so-called n-graphs can lead to substantially improved modeling accuracy, but requires some intelligent parameter reduction methods (state clustering). This is especially the case for the investigated very large vocabulary system, incorporating an active vocabulary of 200000 words. Switching from context independent models to context dependent models-considering the underlying vocabulary-yields in the worst case to 25000 HMMs and very poor trainability for most of the introduced models. Therefore, the conducted investigations are focused on an appropriate state clustering method which is supported by decision trees and some new self organizing approaches to generate the required trees. The presented comparison takes also the different context dependencies (left, right or both sides) into consideration.

[1]  Gerhard Rigoll,et al.  A new hybrid approach to large vocabulary cursive handwriting recognition , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[2]  A. Kosmala,et al.  AN INVESTIGATION OF CONTEXT-DEPENDENT AND HYBRID MODELING TECHNIQUES FOR VERY LARGE VOCABULARY ON-LINE CURSIVE HANDWRITING RECOGNITION , 1998 .

[3]  Hans J. G. A. Dolfing A comparison of ligature and contextual models for hidden Markov model based on-line handwriting recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[5]  Gerhard Rigoll,et al.  Tree-based state clustering using self-organizing principles for large vocabulary on-line handwriting recognition , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[6]  Christoph Neukirchen,et al.  Refining tree-based state clustering by means of formal concept analysis, balanced decision trees and automatically generated model-sets , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[7]  Gerhard Rigoll,et al.  An investigation of the use of trigraphs for large vocabulary cursive handwriting recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Gerhard Rigoll,et al.  Improved on-line handwriting recognition using context dependent hidden Markov models , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.