Chinese Handwriting Recognition: An Algorithmic Perspective

Designing machines that can read handwriting like human beings has been an ambitious goal for more than half a century, driving talented researchers to explore diverse approaches. Obstacles have often been encountered that at first appeared insurmountable but were indeed overcome before long. Yet some open issues remain to be solved. As an indispensable branch, Chinese handwriting recognition has been termed as one of the most difficult Pattern Recognition tasks. Chinese handwriting recognition poses its own unique challenges, such as huge variations in strokes, diversity of writing styles, and a large set of confusable categories. With ever-increasing training data, researchers have pursued elaborate algorithms to discern characters from different categories and compensate for the sample variations within the same category. As a result, Chinese handwriting recognition has evolved substantially and amazing achievements can be seen. This book introduces integral algorithms used in Chinese handwriting recognition and the applications of Chinese handwriting recogniers. The first part of the book covers both widespread canonical algorithms to a reliable recognizer and newly developed scalable methods in Chinese handwriting recognition. The recognition of Chinese handwritten text is presented systematically, including instructive guidelines for collecting samples, novel recognition paradigms, distributed discriminative learning of appearance models and distributed estimation of contextual models for large categories, in addition to celebrated methods, e.g. Gradient features, MQDF and HMMs. In the second part of this book, endeavors are made to create a friendlier human-machine interface through application of Chinese handwriting recognition. Four scenarios are exemplified: grid-assisted input, shortest moving input, handwritten micro-blog, and instant handwriting messenger. All the while, the book moves from basic to more complex approaches, also providing a list for further reading with literature comments.

[1]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Youshou Wu,et al.  Chinese handwriting recognition using hidden Markov models , 2002, Object recognition supported by user interaction for service robots.

[3]  Xiaolong Wang,et al.  HIT-OR3C: an opening recognition corpus for Chinese characters , 2010, DAS '10.

[4]  Chorkin Chan,et al.  Postprocessing statistical language models for handwritten Chinese character recognizer , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[5]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[6]  Gang Liu,et al.  SCUT-COUCH2009—a comprehensive online unconstrained Chinese handwriting database and benchmark evaluation , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[7]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[8]  Fei Yin,et al.  Chinese Handwriting Recognition Contest 2010 , 2010, 2010 Chinese Conference on Pattern Recognition (CCPR).

[9]  Lawrence K. Saul,et al.  Comparison of Large Margin Training to Other Discriminative Methods for Phonetic Recognition by Hidden Markov Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[10]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[11]  Anthony J. Robinson,et al.  An Off-Line Cursive Handwriting Recognition System , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Yu Li,et al.  Discriminative Training of MQDF Classifier on Synthetic Chinese String Samples , 2010, 2010 Chinese Conference on Pattern Recognition (CCPR).

[13]  Changsong Liu,et al.  MQDF Retrained on Selected Sample Set , 2011, IEICE Trans. Inf. Syst..

[14]  Hongwei Hao,et al.  Handwritten Chinese character recognition by metasynthetic approach , 1997, Pattern Recognit..

[15]  Masaki Nakagawa,et al.  On-line handwritten character pattern database sampled in a sequence of sentences without any writing instructions , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[16]  Fei Yin,et al.  Integrating Geometric Context for Text Alignment of Handwritten Chinese Documents , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[17]  Paul P. Wang,et al.  Machine recognition of printed Chinese characters via transformation algorithms , 1972, CDC 1972.

[18]  J. Friedman Regularized Discriminant Analysis , 1989 .

[19]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[20]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[21]  Chew Lim Tan,et al.  Influence of language models and candidate set size on contextual post-processing for Chinese script recognition , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[22]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Ching Y. Suen,et al.  Computer recognition of unconstrained handwritten numerals , 1992, Proc. IEEE.

[24]  Sargur N. Srihari,et al.  Gradient-based contour encoding for character recognition , 1996, Pattern Recognit..

[25]  Yoshihiro Mori,et al.  Neural Networks that Learn to Discriminate Similar Kanji Characters , 1988, NIPS.

[26]  Gideon S. Mann,et al.  Distributed Training Strategies for the Structured Perceptron , 2010, NAACL.

[27]  Hiroshi Sako,et al.  Handwritten Chinese character recognition: alternatives to nonlinear normalization , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[28]  Fei Yin,et al.  CASIA Online and Offline Chinese Handwriting Databases , 2011, 2011 International Conference on Document Analysis and Recognition.

[29]  T. Do Regularized bundle methods for large-scale learning problems with an application to large margin training of hidden Markov models , 2010 .

[30]  Nikos Fakotakis,et al.  The GRUHD database of Greek unconstrained handwriting , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[31]  Qiang Huo,et al.  A study of a new misclassification measure for minimum classification error training of prototype-based pattern classifiers , 2008, 2008 19th International Conference on Pattern Recognition.

[32]  Fei Yin,et al.  Integrating Language Model in Handwritten Chinese Text Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[33]  Ching Y. Suen,et al.  Analysis and Design of a Decision Tree Based on Entropy Reduction and Its Application to Large Character Set Recognition , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Lalit R. Bahl,et al.  Design of a linguistic statistical decoder for the recognition of continuous speech , 1975, IEEE Trans. Inf. Theory.

[35]  Cheng-Lin Liu,et al.  Pseudo two-dimensional shape normalization methods for handwritten Chinese character recognition , 2005, Pattern Recognit..

[36]  David A. McAllester,et al.  Direct Error Rate Minimization of Hidden Markov Models , 2011, INTERSPEECH.

[37]  Lawrence K. Saul,et al.  Large margin training of acoustic models for speech recognition , 2007 .

[38]  RAOUF F. H. FARAG,et al.  Word-Level Recognition of Cursive Script , 1979, IEEE Transactions on Computers.

[39]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[40]  Cheng-Lin Liu,et al.  Lexicon-Driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Paul D. Gader,et al.  Handwritten Word Recognition Using Segmentation-Free Hidden Markov Modeling and Segmentation-Based Dynamic Programming Techniques , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Yanming Zou,et al.  Continuous Chinese Handwriting Recognition with Language Model , 2008 .

[43]  Cheng-Lin Liu,et al.  Normalization-Cooperated Gradient Feature Extraction for Handwritten Character Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Tianwen Zhang,et al.  Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[45]  Roongroj Nopsuwanchai,et al.  Discriminative training methods and their applications to handwriting recognition , 2004 .

[46]  Horst Bunke,et al.  Recognition of cursive Roman handwriting: past, present and future , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[47]  Online Handwritten Japanese Character String Recognition Using Conditional Random Fields , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[48]  Shuyan Zhao,et al.  Two-stage segmentation of unconstrained handwritten Chinese character , 2003, Pattern Recognit..

[49]  Changsong Liu,et al.  MQDF Discriminative Learning Based Offline Handwritten Chinese Character Recognition , 2011, 2011 International Conference on Document Analysis and Recognition.

[50]  Ching Y. Suen,et al.  Document structures: A survey , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[51]  Chew Lim Tan,et al.  An empirical study of statistical language models for contextual post-processing of Chinese script recognition , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[52]  Pengfei Shi,et al.  A metasynthetic approach for segmenting handwritten Chinese character strings , 2005, Pattern Recognit. Lett..

[53]  Tonghua Su,et al.  HIT-MW Dataset for Offline Chinese Handwritten Text Recognition , 2006 .

[54]  Cheng-Lin Liu,et al.  Global shape normalization for handwritten Chinese character recognition: a new method , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[55]  BengioSamy,et al.  Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models , 2004 .

[56]  Roger W. Ehrich,et al.  Experiments in the Contextual Recognition of Cursive Script , 1975, IEEE Transactions on Computers.

[57]  Rui Zhang,et al.  Adaptive confidence transform based classifier combination for Chinese character recognition , 1998, Pattern Recognit. Lett..

[58]  Qiang Huo,et al.  A comparative study of several modeling approaches for large vocabulary offline recognition of handwritten Chinese characters , 2002, Object recognition supported by user interaction for service robots.

[59]  Lawrence K. Saul,et al.  Online Learning and Acoustic Feature Adaptation in Large-Margin Hidden Markov Models , 2010, IEEE Journal of Selected Topics in Signal Processing.

[60]  Bidyut Baran Chaudhuri,et al.  Databases for research on recognition of handwritten characters of Indian scripts , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[61]  W·M·贝尔特曼,et al.  Speech audio process , 2011 .

[62]  Ka-Chung Leung,et al.  Recognition of Handwritten Chinese Characters by Combining Regularization, Fisher's Discriminant and Distorted Sample Generation , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[63]  Cheng-Lin Liu,et al.  Improving HMM-Based Chinese Handwriting Recognition Using Delta Features and Synthesized String Samples , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[64]  Yasuaki Nakano,et al.  Segmentation methods for character recognition: from segmentation to document structure analysis , 1992, Proc. IEEE.

[65]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[66]  Horst Bunke,et al.  Off-line cursive handwriting recognition using multiple classifier systems—on the influence of vocabulary, ensemble, and training set size , 2005 .

[67]  Horst Bunke,et al.  A full English sentence database for off-line handwriting recognition , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[68]  Stefan Knerr,et al.  The IRESTE On/Off (IRONOFF) dual handwriting database , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[69]  R. Bain Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. By George Kingsley Zipf. Cambridge, Mass.: Addison-Wesley Press, Inc., 1949. 573 pp. $6.50 , 1950 .

[70]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[71]  Fumitaka Kimura,et al.  Modified Quadratic Discriminant Functions and the Application to Chinese Character Recognition , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  Horst Bunke,et al.  Handwritten sentence recognition , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[73]  Samy Bengio,et al.  Offline recognition of unconstrained handwritten texts using HMMs and statistical language models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[75]  Daehwan Kim,et al.  Handwritten Korean character image database PE92 , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[76]  Wilbur H. Highleyman,et al.  An Analog Method for Character Recognition , 1961, IRE Trans. Electron. Comput..

[77]  Hiroshi Sako,et al.  Discriminative learning quadratic discriminant function for handwriting recognition , 2004, IEEE Transactions on Neural Networks.

[78]  Liang Xu,et al.  Touching Character Separation in Chinese Handwriting Using Visibility-Based Foreground Analysis , 2011, 2011 International Conference on Document Analysis and Recognition.

[79]  Qiang Huo,et al.  Offline recognition of handwritten Chinese characters using Gabor features, CDHMM modeling and MCE training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[80]  Chew Lim Tan,et al.  A hybrid post-processing system for offline handwritten Chinese script recognition , 2005, Pattern Analysis and Applications.

[81]  Youbin Chen,et al.  Off-line handwritten Chinese character recognition based on crossing line feature , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[82]  Fei Yin,et al.  Handwritten Chinese Text Recognition by Integrating Multiple Contexts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83]  Kitamura Koji,et al.  A Proposal for Adaptive Perturbed Correlation Method -- An Improved Correlation Method for Character Recognition , 2005 .

[84]  Lawrence K. Saul,et al.  Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.

[85]  Guo-hong Li,et al.  An approach to offline handwritten Chinese character recognition based on segment evaluation of adaptive duration , 2004, Journal of Zhejiang University. Science.

[86]  Yongqiang Wang,et al.  Building compact recognizers of handwritten Chinese characters using precision constrained Gaussian model, minimum classification error training and parameter compression , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[87]  Cheng-Lin Liu,et al.  Handwritten digit recognition: benchmarking of state-of-the-art techniques , 2003, Pattern Recognit..

[88]  Atsushi Sato,et al.  Generalized Learning Vector Quantization , 1995, NIPS.

[89]  Fei Yin,et al.  ICDAR 2013 Chinese Handwriting Recognition Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[90]  Xue Gao,et al.  Curved segmentation path generation for unconstrained handwritten Chinese text lines , 2008, APCCAS 2008 - 2008 IEEE Asia Pacific Conference on Circuits and Systems.

[91]  Yongqiang Wang,et al.  Sample-separation-margin based minimum classification error training of pattern classifiers with quadratic discriminant functions , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[92]  Chew Lim Tan,et al.  Combining character-based bigrams with word-based bigrams in contextual postprocessing for Chinese script recognition , 2002, TALIP.

[93]  Richard O. Duda,et al.  Experiments in the Recognition of Hand-Printed Text, Part II-Context Analysis , 1899 .

[94]  Masaki Nakagawa,et al.  Collection and analysis of on-line handwritten Japanese character patterns , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[95]  Kenneth M. Sayre,et al.  Machine recognition of handwritten words: A project report , 1973, Pattern Recognit..

[96]  Luca Maria Gambardella,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Flexible, High Performance Convolutional Neural Networks for Image Classification , 2022 .

[97]  Ming-Wen Chang,et al.  Optical chinese character recognition with a hidden Markov model classifier—a novel approach , 1990 .

[98]  Yan Jiang,et al.  Substring Alignment Method for Lexicon Based Handwritten Chinese String Recognition and Its Application to Address Line Recognition , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[99]  Dai Ruwei,et al.  Chinese character recognition: history, status and prospects , 2007 .

[100]  Tonghua Su,et al.  HMM-Based Recognizer with Segmentation-free Strategy for Unconstrained Chinese Handwritten Text , 2007 .

[101]  John S. MacDonald,et al.  Experimental studies of handwriting signals. , 1966 .

[102]  George Nagy,et al.  Recognition of Printed Chinese Characters , 1966, IEEE Trans. Electron. Comput..

[103]  Lian-Wen Jin,et al.  A Bayesian-based method of unconstrained handwritten offline Chinese text line recognition , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[104]  M. Berthod,et al.  Automatic recognition of handprinted characters—The state of the art , 1980, Proceedings of the IEEE.

[105]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[106]  Rui Zhang,et al.  Location and interpretation of destination addresses on handwritten Chinese envelopes , 2001, Pattern Recognit. Lett..

[107]  Yao Yong Handprinted Chinese character recognition via neural networks , 1988, Pattern Recognit. Lett..

[108]  Erik Cambria,et al.  Common Sense Knowledge for Handwritten Chinese Text Recognition , 2013, Cognitive Computation.

[109]  Xiaoqing Ding,et al.  A segmentation algorithm for handwritten Chinese character strings , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[110]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[111]  Cheng-Lin Liu,et al.  Handwritten Chinese Character Recognition: Effects of Shape Normalization and Feature Extraction , 2006, SACH.

[112]  Tianwen Zhang,et al.  Off-line recognition of realistic Chinese handwriting using segmentation-free strategy , 2009, Pattern Recognit..

[113]  Gernot A. Fink,et al.  Markov models for offline handwriting recognition: a survey , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[114]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[115]  Chih-Chieh Cheng,et al.  Online learning of large margin hidden Markov models for automatic speech recognition , 2011, MLSLP.

[116]  Seong-Whan Lee,et al.  Automatic quality measurement of gray-scale handwriting based on extended average entropy , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[117]  Lianwen Jin,et al.  Building compact MQDF classifier for large character set recognition by subspace distribution sharing , 2008, Pattern Recognit..

[118]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[119]  Lianwen Jin,et al.  Handwritten Chinese Character Recognition with Directional Decomposition Cellular Features , 1998, J. Circuits Syst. Comput..

[120]  R. Casey Moment normalization of handprinted characters , 1970 .

[121]  Chew Lim Tan,et al.  Contextual post-processing based on the confusion matrix in offline handwritten Chinese script recognition , 2004, Pattern Recognit..

[122]  C. K. Chow,et al.  An optimum character recognition system using decision functions , 1957, IRE Trans. Electron. Comput..

[123]  J. H. Munson,et al.  Experiments in the recognition of hand-printed text, part I: character recognition , 1968, AFIPS '68 (Fall, part II).

[124]  Cheng-Lin Liu,et al.  Segmentation-free recognizer based on enhanced four plane feature for realistic Chinese handwriting , 2008, 2008 19th International Conference on Pattern Recognition.

[125]  Bin Chen,et al.  Effects of Generating a Large Amount of Artificial Patterns for On-line Handwritten Japanese Character Recognition , 2011, 2011 International Conference on Document Analysis and Recognition.

[126]  A. Tanaka,et al.  Online recognition of freely handwritten Japanese characters using directional feature densities , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[127]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .