ICDAR 2015 competition HTRtS: Handwritten Text Recognition on the tranScriptorium dataset

This paper describes the second edition of the Handwritten Text Recognition (HTR) contest on the tranScriptorium datasets that has been held in the context of the International Conference on Document Analysis and Recognition 2015. Two tracks with different conditions on the use of training data were proposed. Nine research groups registered in the contest but finally three research submitted results. The handwritten images for this contest were drawn from the English “Bentham collection” dataset used in the tranScriptorium project. A small subset of this collection has been chosen for the present HTR competition. The selected subset has been written by several hands and entails significant variabilities and difficulties regarding the quality of text images, writing styles and crossed-out text. This contest is clearly more difficult than the the first edition both for training and for testing. A portion of the training dataset and the full test dataset were provided in the form of carefully segmented line images, along with the corresponding transcripts. Another portion of the training dataset was provided as raw images and their corresponding transcripts at region level. The three participants achieved good results, with transcription word error rates ranging from 31% down to 44%.

[1]  Chafic Mokbel,et al.  Dynamic and Contextual Information in HMM Modeling for Handwritten Word Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[3]  Christopher Kermorvant,et al.  Over-Generative Finite State Transducer N-Gram for Out-of-Vocabulary Word Recognition , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[4]  Jürgen Schmidhuber,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[5]  Apostolos Antonacopoulos,et al.  The PAGE (Page Analysis and Ground-Truth Elements) Format Framework , 2010, 2010 20th International Conference on Pattern Recognition.

[6]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[7]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[8]  Horst Bunke,et al.  Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System , 2001, Int. J. Pattern Recognit. Artif. Intell..

[9]  Christopher Kermorvant,et al.  Automatic Line Segmentation and Ground-Truth Alignment of Handwritten Documents , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[10]  Christopher Kermorvant,et al.  The A2iA Multi-lingual Text Recognition System at the Second Maurdor Evaluation , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[11]  Alejandro Héctor Toselli,et al.  Ground-Truth Production in the Transcriptorium Project , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[12]  Stephan Vogel,et al.  The QCRI Recognition System for Handwritten Arabic , 2015, ICIAP.

[13]  Alejandro Héctor Toselli,et al.  ICFHR2014 Competition on Handwritten Text Recognition on Transcriptorium Datasets (HTRtS) , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[14]  Richard M. Davis,et al.  tranScriptorium: a european project on handwritten text recognition , 2013, ACM Symposium on Document Engineering.

[15]  Hermann Ney,et al.  Integrated Handwriting Recognition And Interpretation Using Finite-State Models , 2004, Int. J. Pattern Recognit. Artif. Intell..

[16]  Tim Causer,et al.  Building A Volunteer Community: Results and Findings from Transcribe Bentham , 2012, Digit. Humanit. Q..