Self-training for Handwritten Text Line Recognition

Off-line handwriting recognition deals with the task of automatically recognizing handwritten text from images, for example from scanned sheets of paper. Due to the tremendous variations of writing styles encountered between different individuals, this is a very challenging task. Traditionally, a recognition system is trained by using a large corpus of handwritten text that has to be transcribed manually. This, however, is a laborious and costly process. Recent developments have proposed semi-supervised learning, which reduces the need for manually transcribed text by adding large amounts of handwritten text without transcription to the training set. The current paper is the first one, to the knowledge of the authors, where semi-supervised learning for unconstrained handwritten text line recognition is proposed.We demonstrate the applicability of selftraining, a form of semi-supervised learning, to neural network based handwriting recognition. Through a set of experiments we show that text without transcription can successfully be used to significantly increase the performance of a handwriting recognition system.

[1]  M. Seeger Learning with labeled and unlabeled dataMatthias , 2001 .

[2]  Alessandro Vinciarelli,et al.  A survey on off-line Cursive Word Recognition , 2002, Pattern Recognit..

[3]  Naonori Ueda,et al.  Exploitation of Unlabeled Sequences in Hidden Markov Models , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[5]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[6]  Alex Graves,et al.  Connectionist Temporal Classification , 2012 .

[7]  Samy Bengio,et al.  Offline recognition of unconstrained handwritten texts using HMMs and statistical language models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Lawrence Carin,et al.  Semisupervised Learning of Hidden Markov Models via a Homotopy Method , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Amar Gupta,et al.  Handwritten Bank Check Recognition of Courtesy Amounts , 2004 .

[10]  Ming Ye,et al.  Learning to Group Text Lines and Regions in Freeform Handwritten Notes , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[11]  Volkmar Frinken,et al.  Evaluating Retraining Rules for Semi-Supervised Learning in Neural Network Based Cursive Word Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[12]  Sargur N. Srihari,et al.  Semi-supervised Learning for Handwriting Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[13]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[14]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[15]  Horst Bunke,et al.  Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System , 2001, Int. J. Pattern Recognit. Artif. Intell..

[16]  Volkmar Frinken,et al.  Self-training Strategies for Handwriting Word Recognition , 2009, ICDM.

[17]  Horst Bunke,et al.  Recognition of cursive Roman handwriting: past, present and future , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[18]  Sargur N. Srihari,et al.  Prototype Integration in Off-line Handwriting Recognition Adaptation , 2008 .

[19]  Gerhard Rigoll,et al.  Handwritten Address Recognition Using Hidden Markov Models , 2004, Reading and Learning.

[20]  Venu Govindaraju,et al.  Fast handwriting recognition for indexing historical documents , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..