Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention

We present an attention-based model for end-to-end handwriting recognition. Our system does not require any segmentation of the input paragraph. The model is inspired by the differentiable attention models presented recently for speech recognition, image captioning or translation. The main difference is the implementation of covert and overt attention with a multi-dimensional LSTM network. Our principal contribution towards handwriting recognition lies in the automatic transcription without a prior segmentation into lines, which was critical in previous approaches. Moreover, the system is able to learn the reading order, enabling it to handle bidirectional scripts such as Arabic. We carried out experiments on the well-known IAM Database and report encouraging results which bring hope to perform full paragraph transcription in the near future.

[1]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[2]  Torsten Caesar,et al.  Sophisticated topology of hidden Markov models for cursive script recognition , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[3]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[4]  Xiang Bai,et al.  Robust Scene Text Recognition with Automatic Rectification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Hermann Ney,et al.  Bidirectional Decoder Networks for Attention-Based End-to-End Offline Handwriting Recognition , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[6]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[7]  Simon Osindero,et al.  Recursive Recurrent Nets with Attention Modeling for OCR in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[9]  Alex Graves,et al.  Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.

[10]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[11]  Yoshua Bengio,et al.  End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results , 2014, ArXiv.

[12]  Emmanuel Augustin,et al.  Hidden Markov Model Based Word Recognition and Its Application to Legal Amount Reading on French Checks , 1998, Comput. Vis. Image Underst..

[13]  Ole Winther,et al.  Recurrent Spatial Transformer Networks , 2015, ArXiv.

[14]  Christopher Kermorvant,et al.  Dropout Improves Recurrent Neural Networks for Handwriting Recognition , 2013, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[15]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[16]  Yoshua Bengio,et al.  LeRec: A NN/HMM Hybrid for On-Line Handwriting Recognition , 1995, Neural Computation.

[17]  Quoc V. Le,et al.  Listen, Attend and Spell , 2015, ArXiv.

[18]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[19]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[20]  Bruno Grilhères,et al.  The Maurdor Project: Improving Automatic Processing of Digital Documents , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[21]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[22]  Théodore Bluche,et al.  Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition , 2016, NIPS.

[23]  Koray Kavukcuoglu,et al.  Visual Attention , 2020, Computational Models for Cognitive Vision.

[24]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[25]  Volker Märgner,et al.  NIST 2013 Open Handwriting Recognition and Translation (Open HaRT'13) Evaluation , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[26]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[27]  Yoshua Bengio,et al.  Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks , 2015, IEEE Transactions on Multimedia.

[28]  Hermann Ney,et al.  Feature Extraction with Convolutional Neural Networks for Handwritten Word Recognition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[29]  Christopher Kermorvant,et al.  Curriculum Learning for Handwritten Text Line Recognition , 2013, 2014 11th IAPR International Workshop on Document Analysis Systems.