Evaluating Word String Embeddings and Loss Functions for CNN-Based Word Spotting

The recent past has seen CNNs take over the field of word spotting. The dominance of these neural networks is fueled by learning to predict a word string embedding for a given input image. While the PHOC (Pyramidal Histogram of Characters) is most prominently used, other embeddings such as the Discrete Cosine Transform of Words have been used as well. In this work, we investigate the use of different word string embeddings for word spotting. For this, we make use of the recently proposed PHOCNet and modify it to be able to not only learn binary representations. Our extensive evaluation shows that a large number of combinations of word string embeddings and loss functions achieve roughly the same results on different word spotting benchmarks. This leads us to the conclusion that no word string embedding is really superior to another and new embeddings should focus on incorporating more information than only character counts and positions.

[1]  Josep Lladós,et al.  Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method , 2011, 2011 International Conference on Document Analysis and Recognition.

[2]  Volkmar Frinken,et al.  A Novel Word Spotting Method Based on Recurrent Neural Networks , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Josep Lladós,et al.  Efficient segmentation-free keyword spotting in historical document collections , 2015, Pattern Recognit..

[4]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[5]  José A. Rodríguez-Serrano,et al.  Label embedding for text recognition , 2013, BMVC.

[6]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[7]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  C. V. Jawahar,et al.  Deep Feature Embedding for Accurate Recognition and Retrieval of Handwritten Text , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[9]  José A. Rodríguez-Serrano,et al.  A Model-Based Sequence Similarity with Application to Handwritten Word Spotting , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Josep Lladós,et al.  Integrating Visual and Textual Cues for Query-by-String Word Spotting , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[11]  Josep Lladós,et al.  Towards query-by-speech handwritten keyword spotting , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[12]  Alicia Fornés,et al.  The ESPOSALLES database: An ancient marriage license corpus for off-line handwriting recognition , 2013, Pattern Recognit..

[13]  Gernot A. Fink,et al.  Word Spotting in Historical Document Collections with Online-Handwritten Queries , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[14]  Gernot A. Fink,et al.  A Modified Isomap Approach to Manifold Learning in Word Spotting , 2015, GCPR.

[15]  Konstantinos Zagoris,et al.  ICFHR2016 Handwritten Keyword Spotting Competition (H-KWS 2016) , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[16]  Ernest Valveny,et al.  Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Anders Brun,et al.  Semantic and Verbatim Word Spotting Using Deep Neural Networks , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[18]  Gernot A. Fink,et al.  Bag-of-Features HMMs for Segmentation-Free Word Spotting in Handwritten Documents , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[19]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Sudholt Sebastian,et al.  PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents , 2016 .

[21]  François Chollet Information-theoretical label embeddings for large-scale image classification , 2016, ArXiv.

[22]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.