Co-training for Handwritten Word Recognition

To cope with the tremendous variations of writing styles encountered between different individuals, unconstrained automatic handwriting recognition systems need to be trained on large sets of labeled data. Traditionally, the training data has to be labeled manually, which is a laborious and costly process. Semi-supervised learning techniques offer methods to utilize unlabeled data, which can be obtained cheaply in large amounts in order, to reduce the need for labeled data. In this paper, we propose the use of Co-Training for improving the recognition accuracy of two weakly trained handwriting recognition systems. The first one is based on Recurrent Neural Networks while the second one is based on Hidden Markov Models. On the IAM off-line handwriting database we demonstrate a significant increase of the recognition accuracy can be achieved with Co-Training for single word recognition.

[1]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[2]  Horst Bunke,et al.  Recognition of cursive Roman handwriting: past, present and future , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[3]  Alex Graves,et al.  Connectionist Temporal Classification , 2012 .

[4]  Ming Ye,et al.  Learning to Group Text Lines and Regions in Freeform Handwritten Notes , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[5]  Yan Zhou,et al.  Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[6]  Sargur N. Srihari,et al.  Prototype Integration in Off-line Handwriting Recognition Adaptation , 2008 .

[7]  Gerhard Rigoll,et al.  Handwritten Address Recognition Using Hidden Markov Models , 2004, Reading and Learning.

[8]  Horst Bunke,et al.  Rejection strategies for offline handwritten text line recognition , 2006, Pattern Recognit. Lett..

[9]  Horst Bunke,et al.  Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System , 2001, Int. J. Pattern Recognit. Artif. Intell..

[10]  Volkmar Frinken,et al.  Self-training Strategies for Handwriting Word Recognition , 2009, ICDM.

[11]  Venu Govindaraju,et al.  Fast handwriting recognition for indexing historical documents , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[12]  Johansson. Stig,et al.  Manual of information to accompany the Lancaster-Oslo : Bergen Corpus of British English, for use with digital computers , 1978 .

[13]  Sargur N. Srihari,et al.  Semi-supervised Learning for Handwriting Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[14]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[15]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[16]  Amar Gupta,et al.  Handwritten Bank Check Recognition of Courtesy Amounts , 2004 .

[17]  Volkmar Frinken,et al.  Evaluating Retraining Rules for Semi-Supervised Learning in Neural Network Based Cursive Word Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[18]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[19]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[20]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .