High Performance Text Recognition Using a Hybrid Convolutional-LSTM Implementation

Optical character recognition (OCR) has made great progress in recent years due to the introduction of recognition engines based on recurrent neural networks, in particular the LSTM architecture. This paper describes a new, open-source line recognizer combining deep convolutional networks and LSTMs, implemented in PyTorch and using CUDA kernels for speed. Experimental results are given comparing the performance of different combinations of geometric normalization, 1D LSTM, deep convolutional networks, and 2D LSTM networks. An important result is that while deep hybrid networks without geometric text line normalization outperform 1D LSTM networks with geometric normalization, deep hybrid networks with geometric text line normalization still outperform all other networks. The best networks achieve a throughput of more than 100 lines per second and test set error rates on UW3 of 0.25%.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Thomas M. Breuel,et al.  High-Performance OCR for Printed English and Fraktur Using LSTM Networks , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[3]  Andreas Dengel,et al.  High Performance OCR for Camera-Captured Blurred Documents with LSTM Networks , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[4]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[5]  Yaroslav Bulatov,et al.  Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[6]  Thomas M. Breuel,et al.  The OCRopus open source OCR system , 2008, Electronic Imaging.

[7]  Syed Saqib Bukhari,et al.  Towards Generic Text-Line Extraction , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[8]  Marcus Liwicki,et al.  A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks , 2007 .

[9]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[10]  Uwe Springmann,et al.  OCR of historical printings of Latin texts: problems, prospects, progress , 2014, DATeCH '14.

[11]  Anke Lüdeling,et al.  OCR of historical printings with an application to building diachronic corpora: A case study using the RIDGES herbal corpus , 2016, Digit. Humanit. Q..

[12]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[13]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.