Combining Convolutional Neural Networks and LSTMs for Segmentation-Free OCR

We present a novel end-to-end trainable OCR system combining a CNN for feature extraction with 1-D LSTMs for sequence modeling. We present results on English and Arabic handwriting data, and on English machine print data, showing state-of-the-art performance. We believe that our method is simpler than existing 2D LSTM models, and will make it easier to use techniques borrowed from CNN research in computer vision to improve OCR performance.

[1]  Richard M. Schwartz,et al.  A Script-Independent Methodology For Optical Character Recognition , 1998, Pattern Recognit..

[2]  Benjamin Graham,et al.  Fractional Max-Pooling , 2014, ArXiv.

[3]  Rohit Prasad,et al.  Improvements in hidden Markov model based Arabic OCR , 2008, 2008 19th International Conference on Pattern Recognition.

[4]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[5]  Hermann Ney,et al.  A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling for Handwriting Recognition , 2014, SLSP.

[6]  Christopher Kermorvant,et al.  Automatic indexing of French handwritten census registers for probate geneaology , 2011, HIP '11.

[7]  Volker Märgner,et al.  ICDAR 2009 Online Arabic Handwriting Recognition Competition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[8]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[9]  David S. Doermann,et al.  Text Detection and Recognition in Imagery: A Survey , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[11]  Rohit Prasad,et al.  Multi-lingual Offline Handwriting Recognition Using Hidden Markov Models: A Script-Independent Approach , 2006, SACH.

[12]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[13]  Hermann Ney,et al.  The RWTH Large Vocabulary Arabic Handwriting Recognition System , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[14]  Christopher Kermorvant,et al.  The A2iA Multi-lingual Text Recognition System at the Second Maurdor Evaluation , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[15]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[16]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[17]  Thomas M. Breuel,et al.  High-Performance OCR for Printed English and Fraktur Using LSTM Networks , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[18]  Christopher Kermorvant,et al.  Dropout Improves Recurrent Neural Networks for Handwriting Recognition , 2013, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[19]  Frank de Zeeuw,et al.  Slant Correction using Histograms , 2006 .

[20]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[23]  Jürgen Schmidhuber,et al.  Multi-dimensional Recurrent Neural Networks , 2007, ICANN.

[24]  Richard M. Schwartz,et al.  Multilingual Machine Printed OCR , 2001, Int. J. Pattern Recognit. Artif. Intell..

[25]  Hermann Ney,et al.  Improvements in RWTH's System for Off-Line Handwriting Recognition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[26]  Jin Chen,et al.  Gabor features for offline Arabic handwriting recognition , 2010, DAS '10.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.