Efficient Learning-Free Keyword Spotting

In this article, a method for segmentation-based learning-free Query by Example (QbE) keyword spotting on handwritten documents is proposed. The method consists of three steps, namely preprocessing, feature extraction and matching, which address critical variations of text images (e.g., skew, translation, different writing styles). During the feature extraction step, a sequence of descriptors is generated using a combination of a zoning scheme and a novel appearance descriptor, referred as modified Projections of Oriented Gradients. The preprocessing step, which includes contrast normalization and main-zone detection, aims to overcome the shortcomings of the appearance descriptor. Moreover, an uneven zoning scheme is introduced by applying a denser zoning only on query images for a more detailed representation. This leads to a significant reduction in storage requirements of a document collection. The distance between the query and word sequences is efficiently computed by the proposed Selective Matching algorithm. This algorithm is further extended to handle an augmented set of images originating from a single query image. The efficiency of the proposed method is demonstrated by experimentation conducted on seven publicly available datasets. In these experiments, the proposed method significantly outperforms all state-of-the-art learning-free techniques.

[1]  Gernot A. Fink,et al.  Segmentation-free query-by-string word spotting with Bag-of-Features HMMs , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[2]  Volkmar Frinken,et al.  A Novel Word Spotting Method Based on Recurrent Neural Networks , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Alejandro Héctor Toselli Rossi,et al.  Fast HMM-Filler Approach for Key Word Spotting in Handwritten Documents , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[4]  Konstantinos Zagoris,et al.  ICFHR2016 Handwritten Keyword Spotting Competition (H-KWS 2016) , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[5]  Gernot A. Fink,et al.  PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[6]  Salvador España Boquera,et al.  Improving Offline Handwritten Text Recognition with Hybrid HMM/ANN Models , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Nicholas R. Howe,et al.  Part-Structured Inkball Models for One-Shot Handwritten Word Spotting , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[8]  Lior Wolf,et al.  A Simple and Fast Word Spotting Method , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[9]  Sfikas Giorgos,et al.  Zoning Aggregated Hypercolumns for Keyword Spotting , 2016 .

[10]  C. V. Jawahar,et al.  Deep Feature Embedding for Accurate Recognition and Retrieval of Handwritten Text , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[11]  Alicia Fornés,et al.  A Novel Learning-Free Word Spotting Approach Based on Graph Representation , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[12]  Salvador España Boquera,et al.  A combined Convolutional Neural Network and Dynamic Programming approach for text line normalization , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[13]  Alejandro Héctor Toselli,et al.  ICDAR2015 Competition on Keyword Spotting for Handwritten Documents , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[14]  Konstantinos Zagoris,et al.  Segmentation-Based Historical Handwritten Word Spotting Using Document-Specific Local Features , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[15]  Josep Lladós,et al.  A study of Bag-of-Visual-Words representations for handwritten keyword spotting , 2015, International Journal on Document Analysis and Recognition (IJDAR).

[16]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[18]  Ernest Valveny,et al.  Efficient Exemplar Word Spotting , 2012, BMVC.

[19]  C. V. Jawahar,et al.  Matching Handwritten Document Images , 2016, ECCV.

[20]  Ioannis Pratikakis,et al.  Text line and word segmentation of handwritten documents , 2009, Pattern Recognit..

[21]  Andreas Keller,et al.  Lexicon-free handwritten word spotting using character HMMs , 2012, Pattern Recognit. Lett..

[22]  Basilios Gatos,et al.  A survey of document image word spotting techniques , 2017, Pattern Recognit..

[23]  Sudholt Sebastian,et al.  PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents , 2016 .

[24]  R. Manmatha,et al.  Word spotting for historical documents , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[25]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Konstantinos Zagoris,et al.  ICFHR 2014 Competition on Handwritten Keyword Spotting (H-KWS 2014) , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[27]  Konstantinos Zagoris,et al.  Unsupervised Word Spotting in Historical Handwritten Document Images Using Document-Oriented Local Features , 2017, IEEE Transactions on Image Processing.

[28]  Ernest Valveny,et al.  Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Georgios Louloudis,et al.  Keyword Spotting in Handwritten Documents Using Projections of Oriented Gradients , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[30]  Ernest Valveny,et al.  Handwritten Word Spotting with Corrected Attributes , 2013, 2013 IEEE International Conference on Computer Vision.

[31]  Nikos Papamarkos,et al.  Image retrieval systems based on compact shape descriptor and relevance feedback information , 2011, J. Vis. Commun. Image Represent..

[32]  Basilios Gatos,et al.  Isolated character recognition using projections of oriented gradients , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[33]  R. Manmatha,et al.  A scale space approach for automatically segmenting words from historical handwritten documents , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.