Bidirectional Decoder Networks for Attention-Based End-to-End Offline Handwriting Recognition

Recurrent neural networks that can be trained end-to-end on sequence learning tasks provide promising benefits over traditional recognition systems. In this paper, we demonstrate the application of an attention-based long short-term memory decoder network for offline handwriting recognition and analyze the segmentation, classification and decoding errors produced by the model. We further extend the decoding network by a bidirectional topology together with an integrated length estimation procedure and show that it is superior to unidirectional decoder networks. Results are presented for the word and text line recognition tasks of the RIMES handwriting recognition database. The software used in the experiments is freely available for academic research purposes.

[1]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[2]  Simon Osindero,et al.  Recursive Recurrent Nets with Attention Modeling for OCR in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Mark J. F. Gales,et al.  Recurrent neural network language model training with noise contrastive estimation for speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[5]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[6]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[7]  Quoc V. Le,et al.  Listen, Attend and Spell , 2015, ArXiv.

[8]  Hermann Ney,et al.  Fast and Robust Training of Recurrent Neural Networks for Offline Handwriting Recognition , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[9]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[10]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[11]  Haikal El Abed,et al.  ICDAR 2011 - French Handwriting Recognition Competition , 2011, 2011 International Conference on Document Analysis and Recognition.

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Hermann Ney,et al.  RASR - The RWTH Aachen University Open Source Speech Recognition Toolkit , 2011 .

[14]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[15]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[16]  Yoshua Bengio,et al.  End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[18]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[19]  Hermann Ney,et al.  Returnn: The RWTH extensible training framework for universal recurrent neural networks , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[21]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[22]  Navdeep Jaitly,et al.  Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[23]  Christopher Kermorvant,et al.  The A2iA French handwriting recognition system at the Rimes-ICDAR2011 competition , 2012, Electronic Imaging.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Alex Graves,et al.  Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.

[26]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[27]  Farès Menasri,et al.  The A 2 iA French handwriting recognition system at the Rimes-ICDAR 2011 competition , 2011 .

[28]  Jérôme Louradour,et al.  Segmentation-free handwritten Chinese text recognition with LSTM-RNN , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[29]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.