WSRNet: Joint Spotting and Recognition of Handwritten Words

In this work, we present a unified model that can handle both Keyword Spotting and Word Recognition with the same network architecture. The proposed network is comprised of a non-recurrent CTC branch and a Seq2Seq branch that is further augmented with an Autoencoding module. The related joint loss leads to a boost in recognition performance, while the Seq2Seq branch is used to create efficient word representations. We show how to further process these representations with binarization and a retraining scheme to provide compact and highly efficient descriptors, suitable for keyword spotting. Numerical results validate the usefulness of the proposed architecture, as our method outperforms the previous state-of-the-art in keyword spotting, and provides results in the ballpark of the leading methods for word recognition.

[1]  Raymond W. Ptucha,et al.  Intelligent character recognition using fully convolutional neural networks , 2019, Pattern Recognit..

[2]  Dheeraj Peri,et al.  Fully Convolutional Networks for Handwriting Recognition , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[3]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[4]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[5]  Hui Zhang,et al.  Word Image Representation Based on Sequence to Sequence Model with Attention Mechanism for Out-of-Vocabulary Keyword Spotting , 2019, 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[6]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[7]  Volkmar Frinken,et al.  A Novel Word Spotting Method Based on Recurrent Neural Networks , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[9]  C. V. Jawahar,et al.  Matching Handwritten Document Images , 2016, ECCV.

[10]  Geoffrey Leech,et al.  The tagged LOB Corpus : user's manual , 1986 .

[11]  William A. Barrett,et al.  Data Augmentation for Recognition of Handwritten Words and Lines Using a CNN-LSTM Network , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[12]  Basilios Gatos,et al.  A PHOC Decoder for Lexicon-Free Handwritten Word Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[13]  C. V. Jawahar,et al.  HWNet v2: an efficient word image representation for handwritten documents , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[14]  Alicia Fornés,et al.  Handwriting Recognition by Attribute Embedding and Recurrent Neural Networks , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[15]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[16]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[17]  C. V. Jawahar,et al.  Word Spotting and Recognition Using Deep Embedding , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[18]  Joan Puigcerver,et al.  Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition? , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[19]  Maja Pantic,et al.  End-to-End Audiovisual Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Ángel Sánchez,et al.  Offline continuous handwriting recognition using sequence to sequence neural networks , 2018, Neurocomputing.

[21]  Basilios Gatos,et al.  A survey of document image word spotting techniques , 2017, Pattern Recognit..

[22]  Gabriel Synnaeve,et al.  A Fully Differentiable Beam Search Decoder , 2019, ICML.

[23]  Ernest Valveny,et al.  Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Anders Brun,et al.  Semantic and Verbatim Word Spotting Using Deep Neural Networks , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[25]  Richard Entlich Handwriting Recognition for Historical Documents , 2014 .

[26]  Lior Wolf,et al.  CNN-N-Gram for HandwritingWord Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  C. V. Jawahar,et al.  Word Spotting in Silent Lip Videos , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[28]  Gernot A. Fink,et al.  PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[29]  Tara N. Sainath,et al.  An Analysis of "Attention" in Sequence-to-Sequence Models , 2017, INTERSPEECH.

[30]  Tobias Grüning,et al.  Cells in Multidimensional Recurrent Neural Networks , 2016, J. Mach. Learn. Res..

[31]  Alex Graves,et al.  Connectionist Temporal Classification , 2012 .

[32]  Andreas Keller,et al.  Lexicon-free handwritten word spotting using character HMMs , 2012, Pattern Recognit. Lett..

[33]  Volkmar Frinken,et al.  HMM word graph based keyword spotting in handwritten document images , 2016, Inf. Sci..

[34]  Bingbing Ni,et al.  Variational Convolutional Neural Network Pruning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  C. V. Jawahar,et al.  Improving CNN-RNN Hybrid Networks for Handwriting Recognition , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[36]  Gernot A. Fink,et al.  Evaluating Word String Embeddings and Loss Functions for CNN-Based Word Spotting , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[37]  Georgios Louloudis,et al.  Efficient Learning-Free Keyword Spotting , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Basilios Gatos,et al.  Exploring Critical Aspects of CNN-based Keyword Spotting. A PHOCNet Study , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[39]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[40]  C. V. Jawahar,et al.  Deep Feature Embedding for Accurate Recognition and Retrieval of Handwritten Text , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).