Transcription Free LSTM OCR Model Evaluation

In recent years there has been significant progress in the field of Optical Character Recognition (OCR), mainly due to the use of various LSTM-based architectures. In the classic supervised training setup for LSTM-based OCR, the available image data and corresponding transcription is split into a training, a validation and a test set. Especially in the context of historical documents generating these transcriptions can be very costly, therefore minimizing the required transcribed data or maximizing the size of the training set to generate better models are desirable. We propose a novel method to evaluate LSTM OCR-models without requiring transcription ground truth data. For this we employ a second LSTM in an encoder-decoder setup to recreate the image data from the OCR output and evaluate the model based on its difference to the original input. We show that this approach performs similar to traditional transcription based evaluation on a historical document from the 16th century.

[1]  Thomas M. Breuel,et al.  The OCRopus open source OCR system , 2008, Electronic Imaging.

[2]  Andreas Dengel,et al.  OCRoRACT: A Sequence Learning OCR System Trained on Isolated Characters , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[3]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[4]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[5]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[6]  Enrique Vidal,et al.  Computation of Normalized Edit Distance and Applications , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[8]  Thomas M. Breuel,et al.  High-Performance OCR for Printed English and Fraktur Using LSTM Networks , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[9]  Thomas M. Breuel,et al.  High Performance Text Recognition Using a Hybrid Convolutional-LSTM Implementation , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[10]  Fabiola Martínez Licona,et al.  Improving Automatic Speech Recognition Containing Additive Noise Using Deep Denoising Autoencoders of LSTM Networks , 2016, SPECOM.

[11]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[12]  Eric Brill,et al.  An Improved Error Model for Noisy Channel Spelling Correction , 2000, ACL.

[13]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[14]  Erik Marchi,et al.  A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Xuanjing Huang,et al.  Investigating Language Universal and Specific Properties in Word Embeddings , 2016, ACL.

[16]  Dietrich Klakow,et al.  Testing the correlation of word error rate and perplexity , 2002, Speech Commun..

[17]  Andreas Dengel,et al.  anyOCR: A sequence learning based OCR system for unlabeled historical documents , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).