Word spotting and recognition via a joint deep embedding of image and text

Abstract This work addresses three important yet challenging problems of handwritten text understanding: word recognition, query-by-example (QBE) word spotting and query-by-string (QBS) word spotting. In most existing approaches, these related tasks are considered independently. We propose a single unified framework based on deep learning to solve all three tasks efficiently and simultaneously. In this framework, an end-to-end deep neural network architecture is used for the joint embedding of handwritten word texts and images. Word images are embedded via a convolution neural network (CNN), which is trained to predict a representation modeling character-level information. The output of the last convolutional layer is considered as representation in the joint embedding subspace. Likewise, a recurrent neural network (RNN) is used to map a sequence of characters to the joint subspace representation. Finally, a model based on multi-layer perceptrons is proposed to predict the matching probability between two embedding vectors. Experiments on five databases of documents written in three languages show our method to yield state-of-the-art performance for QBE and QBS word spotting. The proposed method also obtains competitive results for word recognition, when compared against approaches tailored specifically for this task.

[1]  Arjun Sharma,et al.  Adapting off-the-shelf CNNs for word spotting & recognition , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[2]  Hermann Ney,et al.  Tandem HMM with convolutional neural network for handwritten word recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Ching Y. Suen,et al.  Learning-based word spotting system for Arabic handwritten documents , 2014, Pattern Recognit..

[4]  Björn W. Schuller,et al.  Keyword spotting exploiting Long Short-Term Memory , 2013, Speech Commun..

[5]  Juergen Luettin,et al.  A new normalization technique for cursive handwritten words , 2001, Pattern Recognit. Lett..

[6]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[7]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[8]  Hartmut Neven,et al.  PhotoOCR: Reading Text in Uncontrolled Conditions , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Yin Li,et al.  Learning Deep Structure-Preserving Image-Text Embeddings , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Basilios Gatos,et al.  A segmentation-free word spotting method for historical printed documents , 2016, Pattern Analysis and Applications.

[11]  M. Pechwitz,et al.  IFN/ENIT: database of handwritten arabic words , 2002 .

[12]  Prasenjit Dey,et al.  HMM-based Indic handwritten word recognition using zone segmentation , 2016, Pattern Recognit..

[13]  R. Manmatha,et al.  A scale space approach for automatically segmenting words from historical handwritten documents , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Ernest Valveny,et al.  Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Ernest Valveny,et al.  Segmentation-free word spotting with exemplar SVMs , 2014, Pattern Recognit..

[16]  Dimosthenis Karatzas,et al.  TextProposals: A text-specific selective search algorithm for word spotting in the wild , 2016, Pattern Recognit..

[17]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18]  Gernot A. Fink,et al.  On the Use of Context-Dependent Modeling Units for HMM-Based Offline Handwriting Recognition , 2007 .

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Mohamed Cheriet,et al.  Query-by-example word spotting using multiscale features and classification in the space of representation differences , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[22]  Edward M. Riseman,et al.  Word spotting: a new approach to indexing handwriting , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Umapada Pal,et al.  Cross-language Framework for Word Recognition and Spotting of Indic Scripts , 2017, Pattern Recognit..

[24]  Christopher Kermorvant,et al.  Dropout Improves Recurrent Neural Networks for Handwriting Recognition , 2013, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[25]  Lior Wolf,et al.  CNN-N-Gram for HandwritingWord Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jean-Yves Ramel,et al.  Comparative study of conventional time series matching techniques for word spotting , 2018, Pattern Recognit..

[27]  Basilios Gatos,et al.  A survey of document image word spotting techniques , 2017, Pattern Recognit..

[28]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Hermann Ney,et al.  A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling for Handwriting Recognition , 2014, SLSP.

[30]  Jose Dolz,et al.  3D fully convolutional networks for subcortical segmentation in MRI: A large-scale study , 2016, NeuroImage.

[31]  Dimosthenis Karatzas,et al.  LSDE: Levenshtein Space Deep Embedding for Query-by-String Word Spotting , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[32]  C. V. Jawahar,et al.  Matching Handwritten Document Images , 2016, ECCV.

[33]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[35]  Josep Lladós,et al.  Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method , 2011, 2011 International Conference on Document Analysis and Recognition.

[36]  Chafic Mokbel,et al.  Combining Slanted-Frame Classifiers for Improved HMM-Based Arabic Handwriting Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Anders Brun,et al.  Neural Ctrl-F: Segmentation-Free Query-by-String Word Spotting in Handwritten Manuscript Collections , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Gernot A. Fink,et al.  Evaluating Word String Embeddings and Loss Functions for CNN-Based Word Spotting , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[39]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Edouard Geoffrois,et al.  Results of the RIMES Evaluation Campaign for Handwritten Mail Processing , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[41]  Chafic Mokbel,et al.  Recognition of Arabic handwritten words using contextual character models , 2008, Electronic Imaging.

[42]  Chafic Mokbel,et al.  Dynamic and Contextual Information in HMM Modeling for Handwritten Word Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  C. V. Jawahar,et al.  Deep Feature Embedding for Accurate Recognition and Retrieval of Handwritten Text , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[44]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[45]  José A. Rodríguez-Serrano,et al.  A Model-Based Sequence Similarity with Application to Handwritten Word Spotting , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Gernot A. Fink,et al.  PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[47]  Josep Lladós,et al.  Efficient segmentation-free keyword spotting in historical document collections , 2015, Pattern Recognit..

[48]  Gernot A. Fink,et al.  Attribute CNNs for word spotting in handwritten documents , 2017, International Journal on Document Analysis and Recognition (IJDAR).

[49]  Salvador España Boquera,et al.  Improving Offline Handwritten Text Recognition with Hybrid HMM/ANN Models , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Dilek Z. Hakkani-Tür,et al.  Zero-shot learning of intent embeddings for expansion by convolutional deep structured semantic models , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[51]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[52]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[53]  Jiri Matas,et al.  Scene Text Localization and Recognition with Oriented Stroke Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[54]  Mohamed Cheriet,et al.  Hierarchical representation learning using spherical k-means for segmentation-free word spotting , 2018, Pattern Recognit. Lett..