Boosting the Deep Multidimensional Long-Short-Term Memory Network for Handwritten Recognition Systems

One of the main challenges in the handwriting recognition area lies in identifying complete lines of handwritten text. In this paper, we propose a handwriting recognition system based on a deep multidimensional long-short-term memory (MDLSTM) network within a hybrid hidden Markov model framework. The MDLSTM architecture was elaborated to enhance the recognition performance and decrease the recognition time. Accordingly, we present modifications regarding the layers order and the number of pooling layers compared to a standard MDLSTM model. Since the results reported in the literature for deeper MDLSTM architectures relies on optimizing the network width with a fixed depth, we investigate the trade-off between both these properties to obtain an optimal topology. The system was evaluated with English handwritten text lines from the IAM database and the experiments demonstrated that the proposed MDLSTM architecture was able to maintain a robust recognition performance (around 3.6% CER and 10.5% WER) and present significant speedups, approximately 48% and 32% faster than the state-of-the-art MDLSTM optical model, regarding the learning and classification times, respectively. The full system including a decoder with linguistic knowledge presents competitive results with the state-of-the-art.

[1]  Ángel Sánchez,et al.  Off-line handwritten signature detection by analysis of evidence accumulation , 2012, International Journal on Document Analysis and Recognition (IJDAR).

[2]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[3]  Shujing Lu,et al.  Recognition of handwritten Chinese address with writing variations , 2016, Pattern Recognit. Lett..

[4]  Hermann Ney,et al.  Returnn: The RWTH extensible training framework for universal recurrent neural networks , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[6]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[9]  Théodore Bluche,et al.  Deep Neural Networks for Large Vocabulary Handwritten Text Recognition , 2015 .

[10]  Christopher Kermorvant,et al.  Dropout Improves Recurrent Neural Networks for Handwriting Recognition , 2013, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[11]  Théodore Bluche,et al.  Gated Convolutional Recurrent Neural Networks for Multilingual Handwriting Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[12]  Sanchez Joan Andreu,et al.  ICFHR2016 Competition on Handwritten Text Recognition on the READ Dataset , 2016 .

[13]  Emmanuel Augustin,et al.  Industrial bank check processing: the A2iA CheckReaderTM , 2001, International Journal on Document Analysis and Recognition.

[14]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[15]  Christopher Kermorvant,et al.  The A2iA Arabic Handwritten Text Recognition System at the Open HaRT2013 Evaluation , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[16]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[17]  Hermann Ney,et al.  Fast and Robust Training of Recurrent Neural Networks for Offline Handwriting Recognition , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[18]  Geoffrey Leech,et al.  The tagged LOB Corpus : user's manual , 1986 .

[19]  Joan Puigcerver,et al.  Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition? , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[20]  Hermann Ney,et al.  Improvements in RWTH's System for Off-Line Handwriting Recognition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[21]  Changming Sun,et al.  Skew and slant correction for document images using gradient direction , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[22]  Tobias Grüning,et al.  Cells in Multidimensional Recurrent Neural Networks , 2016, J. Mach. Learn. Res..

[23]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[24]  Timothy Dozat,et al.  Incorporating Nesterov Momentum into Adam , 2016 .

[25]  Hermann Ney,et al.  Sequence-discriminative training of recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Hermann Ney,et al.  Handwriting Recognition with Large Multidimensional Long Short-Term Memory Recurrent Neural Networks , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).