Self-training Strategies for Handwriting Word Recognition

Handwriting recognition is an emerging subfield of human-computer interaction that has many potential industrial applications, e.g. in postal automation, bank check processing, and automatic form reading. Training a recognizer, however, requires a substantial amount of training examples together with their corresponding ground truth, which needs to be created by humans. A promising way to significantly reduce this effort, and hence cut system development costs, is offered by semi-supervised learning, in which both text with and text without transcription is used for training. However, until today there is no straightforward and established way of semi-supervised learning, particularly not for handwriting recognition. In the self-training approach, an initially trained recognition system creates a new training set from unlabeled data. Using this set, a new recognizer is created. The creation of the training set is done by selecting elements from the unlabeled set, according to their recognition confidence. The success of self-training depends crucially on the data selected. In this paper, we test and compare different rules used to select new training data for single word recognition with and without additional language information in the form of a dictionary. We demonstrate that it is possible to substantially increase the recognition accuracy for both systems.

[1]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[2]  Naonori Ueda,et al.  Exploitation of Unlabeled Sequences in Hidden Markov Models , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Horst Bunke,et al.  Rejection strategies for offline handwritten text line recognition , 2006, Pattern Recognit. Lett..

[4]  Volkmar Frinken,et al.  Evaluating Retraining Rules for Semi-Supervised Learning in Neural Network Based Cursive Word Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[5]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[6]  Lawrence Carin,et al.  Semisupervised Learning of Hidden Markov Models via a Homotopy Method , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Markus Junker,et al.  Reading and Learning , 2004, Lecture Notes in Computer Science.

[8]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[9]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[10]  Horst Bunke,et al.  Recognition of cursive Roman handwriting: past, present and future , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[11]  Matthias Seeger,et al.  Learning from Labeled and Unlabeled Data , 2010, Encyclopedia of Machine Learning.

[12]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[13]  Venu Govindaraju,et al.  Fast handwriting recognition for indexing historical documents , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[14]  Amar Gupta,et al.  Handwritten Bank Check Recognition of Courtesy Amounts , 2004 .

[15]  H. J. Scudder,et al.  Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.

[16]  Alessandro Vinciarelli,et al.  A survey on off-line Cursive Word Recognition , 2002, Pattern Recognit..

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Gerhard Rigoll,et al.  Handwritten Address Recognition Using Hidden Markov Models , 2004, Reading and Learning.

[19]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[20]  Horst Bunke,et al.  Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System , 2001, Int. J. Pattern Recognit. Artif. Intell..

[21]  Paul A. Viola,et al.  Learning to Group Text Lines and Regions in Freeform Handwritten Notes , 2007 .