A Novel Short Merged Off-line Handwritten Chinese Character String Segmentation Algorithm Using Hidden Markov Model

Hidden Markov model (called "HMM" for short) has been a widespread method to segment sequential data in speech recognition and DNA sequence analysis. According to the same principle, it can be also used in segmenting short merged off-line handwritten Chinese character strings, which is a tough issue but often met in practice. Because HMM is still not a common method in this field nowadays, in this paper, we will introduce a novel algorithm using HMM for the segmentation issue above. Eventually, this segmentation algorithm can achieve an applicable performance even when 3755 character classes are compressed into similar characters classes with only 1% amount of original ones, and it also shows an enormous potential of segmenting long text lines.