Tree-based state clustering using self-organizing principles for large vocabulary on-line handwriting recognition

The introduction of trigraphs offers a powerful method for the accuracy enhancement of handwriting modeling. A trigraph is a hidden Markov model (HMM) for a special character with defined adjacent characters. Especially in large vocabulary systems, as they are investigated here, the number of unseen trigraphs for which no training samples are available, exceeds the number of seen trigraphs by far. This paper presents a novel approach, which allows a synthesis of unseen trigraphs from seen trigraphs. With the method proposed here, a mean relative error reduction of 42% was obtained on a writer dependent system, resulting in an overall word recognition rate of 94.1%.