Improving HMM-Based Chinese Handwriting Recognition Using Delta Features and Synthesized String Samples

The HMM-based segmentation-free strategy for Chinese handwriting recognition has the advantage of training without annotation of character boundaries. However, the recognition performance has been limited by the small number of string samples. In this paper, we explore two techniques to improve the performance. First, Delta features are added to the static ones for alleviating the conditional independence assumption of HMMs. We then investigate into techniques for synthesizing string samples from isolated character images. We show that synthesizing linguistically natural string samples utilizes isolated samples insufficiently. Instead, we draw character samples without replacement and concatenate them into string images through between-character gaps. Our experimental results demonstrate that both Delta features and synthesized string samples significantly improve the recognition performance. Combining these with a bigram language model, the recognition accuracy has been increased by 36-38% compared to our previous system.

[1]  Baihua Xiao,et al.  Chinese character recognition: history, status and prospects , 2007, Frontiers of Computer Science in China.

[2]  Tamás Varga Off-line cursive handwriting recognition using synthetic training data , 2006 .

[3]  Venu Govindaraju,et al.  Synthetic handwritten CAPTCHAs , 2009, Pattern Recognit..

[4]  Tianwen Zhang,et al.  Off-line recognition of realistic Chinese handwriting using segmentation-free strategy , 2009, Pattern Recognit..

[5]  Dai Ruwei,et al.  Chinese character recognition: history, status and prospects , 2007 .

[6]  Tonghua Su,et al.  HMM-Based Recognizer with Segmentation-free Strategy for Unconstrained Chinese Handwritten Text , 2007 .

[7]  Qiang Huo,et al.  A comparative study of several modeling approaches for large vocabulary offline recognition of handwritten Chinese characters , 2002, Object recognition supported by user interaction for service robots.

[8]  Tianwen Zhang,et al.  Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[9]  Cheng-Lin Liu,et al.  Lexicon-Driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Rohit Prasad,et al.  Multi-lingual Offline Handwriting Recognition Using Hidden Markov Models: A Script-Independent Approach , 2006, SACH.

[11]  Cheng-Lin Liu,et al.  Segmentation-free recognizer based on enhanced four plane feature for realistic Chinese handwriting , 2008, 2008 19th International Conference on Pattern Recognition.

[12]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[13]  Fei Yin,et al.  Integrating Language Model in Handwritten Chinese Text Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.