Query-by-Online Word Spotting Revisited: Using CNNs for Cross-Domain Retrieval

A word spotting system is in large parts characterized by the query modalities it is able to process. The most common modalities here are Query-by-Example and Query-by-String. However, recently a new query type has been proposed: In Query-by-Online-Trajectory (QbO) the query is presented as a set of online-handwritten trajectories. In this work we devise a cross-domain word spotting framework using CNNs which is able to accomplish the QbO task. In particular, we design two different QbO systems which we evaluate in a number of experiments. We are not only able to outperform the current state of the art in QbO word spotting but also show that a system using a single CNN for both online and offline data achieves superior results compared to a system that uses a CNN for each domain individually.

[1]  Gernot A. Fink,et al.  Segmentation-free query-by-string word spotting with Bag-of-Features HMMs , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[2]  Andreas Keller,et al.  HMM-based Word Spotting in Handwritten Documents Using Subword Models , 2010, 2010 20th International Conference on Pattern Recognition.

[3]  Josep Lladós,et al.  Efficient segmentation-free keyword spotting in historical document collections , 2015, Pattern Recognit..

[4]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Gernot A. Fink,et al.  Camera-based Whiteboard Reading: New Approaches to a Challenging Task , 2008, ICFHR 2008.

[6]  José A. Rodríguez-Serrano,et al.  Handwritten word-spotting using hidden Markov models and universal vocabularies , 2009, Pattern Recognit..

[7]  Josep Lladós,et al.  Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method , 2011, 2011 International Conference on Document Analysis and Recognition.

[8]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[9]  C. V. Jawahar,et al.  Retrieval of online handwriting by synthesis and matching , 2009, Pattern Recognit..

[10]  Sudholt Sebastian,et al.  PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents , 2016 .

[11]  Alexander H. Waibel,et al.  Online handwriting recognition: the NPen++ recognizer , 2001, International Journal on Document Analysis and Recognition.

[12]  Volkmar Frinken,et al.  A Novel Word Spotting Method Based on Recurrent Neural Networks , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  C. V. Jawahar,et al.  Deep Feature Embedding for Accurate Recognition and Retrieval of Handwritten Text , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[14]  Josep Lladós,et al.  Integrating Visual and Textual Cues for Query-by-String Word Spotting , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[15]  Gernot A. Fink,et al.  Word Spotting in Historical Document Collections with Online-Handwritten Queries , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[16]  Isabelle Guyon,et al.  UNIPEN project of on-line data exchange and recognizer benchmarks , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[17]  Ernest Valveny,et al.  Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Anders Brun,et al.  Semantic and Verbatim Word Spotting Using Deep Neural Networks , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[19]  Gernot A. Fink,et al.  Bag-of-Features HMMs for Segmentation-Free Word Spotting in Handwritten Documents , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[20]  Edward M. Riseman,et al.  Word spotting: a new approach to indexing handwriting , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Marcus Liwicki,et al.  IAM-OnDB - an on-line English sentence database acquired from handwritten text on a whiteboard , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[22]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[23]  R. Manmatha,et al.  Word spotting for historical documents , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[24]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[25]  A Fink Gernot,et al.  Robust Output Modeling in Bag-of-Features HMMs for Handwriting Recognition , 2016 .