Training LSTM-RNN with Imperfect Transcription: Limitations and Outcomes

Bidirectional LSTM-RNN have become one of the standard methods for sequence learning, especially in the context of OCR due to its ability to process unsegmented data and its inherent statistical language modeling [5]. It has recently been shown that training LSTM-RNNs even with imperfect transcriptions can lead to improved transcription results [7, 14]. The statistical nature of the LSTM's inherent language modeling can compensate for some of the errors in the ground truth and learn the correct temporal relations. In this paper we systematically explore the limits of the LSTM's language modeling ability by comparing the impact of imperfect transcriptions with various hand crafted error types and real erroneous data created through segmentation and clustering. We show that training LSTM-RNN with imperfect transcriptions can produce useful OCR models even if the ground truth error is up to 20%. Further we show that it can compensate for some handcrafted error types with error rates of up to 40% almost perfectly.