Learning confidence transformation for handwritten Chinese text recognition

Handwritten text recognition systems commonly combine character classification confidence scores and context models for evaluating candidate segmentation-recognition paths, and the classification confidence is usually optimized at character level. In this paper, we investigate into different confidence-learning methods for handwritten Chinese text recognition and propose a string-level confidence-learning method, which estimates confidence parameters by directly optimizing the performance of character string recognition. We first compare the performances of parametric (class-dependent and class-independent parameters) and nonparametric (isotonic regression) confidence-learning methods. Then, we propose two regularized confidence estimation methods and particularly, a string-level confidence-learning method under the minimum classification error criterion. In experiments of online handwritten Chinese text recognition, the string-level confidence-learning method is shown to effectively improve the string recognition performance. Using three character classifiers, the character correct rates are improved from 92.39, 90.24 and 88.69 % to 92.76, 90.91 and 89.93 %, respectively.

[1]  Fei Yin,et al.  Integrating Geometric Context for Text Alignment of Handwritten Chinese Documents , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[2]  Fei Yin,et al.  CASIA Online and Offline Chinese Handwriting Databases , 2011, 2011 International Conference on Document Analysis and Recognition.

[3]  Tianwen Zhang,et al.  Off-line recognition of realistic Chinese handwriting using segmentation-free strategy , 2009, Pattern Recognit..

[4]  Cheng-Lin Liu,et al.  Classifier combination based on confidence transformation , 2005, Pattern Recognit..

[5]  H. Robbins A Stochastic Approximation Method , 1951 .

[6]  Cheng-Lin Liu,et al.  Classification and Learning Methods for Character Recognition: Advances and Remaining Problems , 2008, Machine Learning in Document Analysis and Recognition.

[7]  Mou-Yen Cheii,et al.  Variable Duration Hidden Markov Model and Morphological Segmentation for Handwritten Word Recognition , 1993 .

[8]  W. Chou Discriminant-function-based minimum recognition error rate pattern-recognition approach to speech recognition , 2000, Proc. IEEE.

[9]  Alain Biem,et al.  Minimum classification error training for online handwriting recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Cheng-Lin Liu,et al.  Classification and learning in character recognition: Advances and remaining problems , 2008 .

[11]  Fei Yin,et al.  Improving Handwritten Chinese Text Recognition by Confidence Transformation , 2011, 2011 International Conference on Document Analysis and Recognition.

[12]  Xiaobo Jin,et al.  Regularized margin-based conditional log-likelihood loss for prototype learning , 2010, Pattern Recognit..

[13]  Cheng-Lin Liu One-Vs-All Training of Prototype Classifier for Pattern Classification and Retrieval , 2010, 2010 20th International Conference on Pattern Recognition.

[14]  Xiang-Dong Zhou,et al.  Online Handwritten Japanese Character String Recognition Incorporating Geometric Context , 2007 .

[15]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[16]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[17]  Ching Y. Suen,et al.  Character Recognition Systems: A Guide for Students and Practitioners , 2007 .

[18]  Cheng-Lin Liu,et al.  Handwritten numeral string recognition: character-level vs string-level classifier training , 2004, ICPR 2004.

[19]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[20]  Cheng-Lin Liu,et al.  String-level learning of confidence transformation for Chinese handwritten text recognition , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[21]  Chew Lim Tan,et al.  A hybrid post-processing system for offline handwritten Chinese script recognition , 2005, Pattern Analysis and Applications.

[22]  Fumitaka Kimura,et al.  Modified Quadratic Discriminant Functions and the Application to Chinese Character Recognition , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Cheng-Lin Liu,et al.  An approach for real-time recognition of online Chinese handwritten sentences , 2012, Pattern Recognit..

[24]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[25]  F. T. Wright,et al.  Order restricted statistical inference , 1988 .

[26]  G. McLachlan,et al.  Pattern Classification: A Unified View of Statistical and Neural Approaches. , 1998 .

[27]  F. T. Wright,et al.  Order restricted statistical inference , 1988 .

[28]  Jeffrey A. Barnett,et al.  Computational Methods for a Mathematical Theory of Evidence , 1981, IJCAI.

[29]  Jürgen Schürmann,et al.  Pattern classification , 2008 .

[30]  Masaki Nakagawa,et al.  'Online recognition of Chinese characters: the state-of-the-art , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[32]  Hiromichi Fujisawa,et al.  Machine Learning in Document Analysis and Recognition , 2008, Studies in Computational Intelligence.

[33]  Hiroshi Sako,et al.  Effects of classifier structures and training regimes on integrated segmentation and recognition of handwritten numeral strings , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  H. D. Brunk,et al.  AN EMPIRICAL DISTRIBUTION FUNCTION FOR SAMPLING WITH INCOMPLETE INFORMATION , 1955 .

[35]  Fei Yin,et al.  Handwritten Chinese Text Recognition by Integrating Multiple Contexts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Qiang Fu,et al.  Context Driven Chinese String Segmentation and Recognition , 2006, SSPR/SPR.

[37]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[38]  Rui Zhang,et al.  Adaptive confidence transform based classifier combination for Chinese character recognition , 1998, Pattern Recognit. Lett..

[39]  Paul D. Gader,et al.  WORD LEVEL DISCRIMINATIVE TRAINING FOR HANDWRITTEN WORD RECOGNITION , 2004 .

[40]  Cheng-Lin Liu,et al.  Online Japanese Character Recognition Using Trajectory-Based Normalization and Direction Feature Extraction , 2006 .

[41]  Larry Gillick,et al.  A probabilistic approach to confidence estimation and evaluation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.