Topic Language Model Adaption for Recognition of Homologous Offline Handwritten Chinese Text Image

As the content of a full text page usually focuses on a specific topic, a topic language model adaption method is proposed to improve the recognition performance of homologous offline handwritten Chinese text image. Firstly, the text images are recognized with a character based bi-gram language model. Secondly, the topic of the text image is matched adaptively. Finally, the text image is recognized again with the best matched topic language model. To obtain a tradeoff between the recognition performance and computational complexity, a restricted topic language model adaption method is further presented. The methods have been evaluated on 100 offline Chinese text images. Compared to the general language model, the topic language model adaption has reduced the relative error rate by 11.94%. The restricted topic language model has lessened the running time by 19.22% at the cost of losing 0.35% of the accuracy.

[1]  Henry S. Baird,et al.  Incorporating Linguistic Model Adaptation into Whole-Book Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[2]  Lianwen Jin,et al.  A Bayesian-based probabilistic model for unconstrained handwritten offline Chinese text line recognition , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[3]  Cheng-Lin Liu,et al.  Improving HMM-Based Chinese Handwriting Recognition Using Delta Features and Synthesized String Samples , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[4]  Jerome R. Bellegarda,et al.  Statistical language model adaptation: review and perspectives , 2004, Speech Commun..

[5]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[6]  Tianwen Zhang,et al.  Off-line recognition of realistic Chinese handwriting using segmentation-free strategy , 2009, Pattern Recognit..

[7]  Xiaoqing Ding,et al.  Improve Handwritten Character Recognition Performance by Heteroscedastic Linear Discriminant Analysis , 2006, ICPR.

[8]  Yu Li,et al.  Discriminative Training of MQDF Classifier on Synthetic Chinese String Samples , 2010, 2010 Chinese Conference on Pattern Recognition (CCPR).

[9]  Alex Waibel,et al.  A Fast Search Technique for Large Vocabulary On-Line Handwriting Recognition , 1998 .

[10]  Tong Liu,et al.  A Novel Segmentation and Recognition Algorithm for Chinese Handwritten Address Character Strings , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[11]  Qiang Fu,et al.  A hidden Markov model based segmentation and recognition algorithm for Chinese handwritten address character strings , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[12]  Fumitaka Kimura,et al.  Modified Quadratic Discriminant Functions and the Application to Chinese Character Recognition , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Fei Yin,et al.  Improving Handwritten Chinese Text Recognition by Unsupervised Language Model Adaptation , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[14]  Fei Yin,et al.  Handwritten Chinese Text Recognition by Integrating Multiple Contexts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Changsong Liu,et al.  MQDF Discriminative Learning Based Offline Handwritten Chinese Character Recognition , 2011, 2011 International Conference on Document Analysis and Recognition.

[16]  Geoff Holmes,et al.  Multinomial Naive Bayes for Text Categorization Revisited , 2004, Australian Conference on Artificial Intelligence.

[17]  Fei Yin,et al.  Integrating Language Model in Handwritten Chinese Text Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.