ICFHR2016 Competition on Handwritten Text Recognition on the READ Dataset

This paper describes the Handwritten Text Recognition (HTR) competition on the READ dataset that has been held in the context of the International Conference on Frontiers in Handwriting Recognition 2016. This competition aims to bring together researchers working on off-line HTR and provide them a suitable benchmark to compare their techniques on the task of transcribing typical historical handwritten documents. Two tracks with different conditions on the use of training data were proposed. Ten research groups registered in the competition but finally five submitted results. The handwritten images for this competition were drawn from the German document Ratsprotokolle collection composed of minutes of the council meetings held from 1470 to 1805, used in the READ project. The selected dataset is written by several hands and entails significant variabilities and difficulties. The five participants achieved good results with transcriptions word error rates ranging from 21% to 47%.

[1]  Ngoc Thang Vu,et al.  Generating exact lattices in the WFST framework , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Alejandro Héctor Toselli,et al.  ICDAR 2015 competition HTRtS: Handwritten Text Recognition on the tranScriptorium dataset , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[3]  Christopher Kermorvant,et al.  Over-Generative Finite State Transducer N-Gram for Out-of-Vocabulary Word Recognition , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[4]  Nicholas R. Howe,et al.  A Laplacian Energy for Document Binarization , 2011, 2011 International Conference on Document Analysis and Recognition.

[5]  Alicia Fornés,et al.  The ESPOSALLES database: An ancient marriage license corpus for off-line handwriting recognition , 2013, Pattern Recognit..

[6]  Alfons Juan-Císcar,et al.  The RODRIGO Database , 2010, LREC.

[7]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Alejandro Héctor Toselli,et al.  Ground-Truth Production in the Transcriptorium Project , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[9]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[10]  Verónica Romero,et al.  Influence of text line segmentation in Handwritten Text Recognition , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[11]  Alejandro Héctor Toselli,et al.  ICFHR2014 Competition on Handwritten Text Recognition on Transcriptorium Datasets (HTRtS) , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[12]  Hermann Ney,et al.  Integrated Handwriting Recognition And Interpretation Using Finite-State Models , 2004, Int. J. Pattern Recognit. Artif. Intell..

[13]  Alex Graves,et al.  Connectionist Temporal Classification , 2012 .

[14]  Horst Bunke,et al.  Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System , 2001, Int. J. Pattern Recognit. Artif. Intell..

[15]  Alejandro Héctor Toselli,et al.  Computer Assisted Transcription of Handwritten Text Images , 2007 .

[16]  Richard M. Davis,et al.  tranScriptorium: a european project on handwritten text recognition , 2013, ACM Symposium on Document Engineering.

[17]  Andreas Keller,et al.  Lexicon-free handwritten word spotting using character HMMs , 2012, Pattern Recognit. Lett..

[18]  Apostolos Antonacopoulos,et al.  The PAGE (Page Analysis and Ground-Truth Elements) Format Framework , 2010, 2010 20th International Conference on Pattern Recognition.

[19]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.