Training-Free and Segmentation-Free Word Spotting using Feature Matching and Query Expansion

Historical handwritten text recognition is an interesting yet challenging problem. In recent times, deep learning based methods have achieved significant performance in handwritten text recognition. However, handwriting recognition using deep learning needs training data, and often, text must be previously segmented into lines (or even words). These limitations constrain the application of HTR techniques in document collections, because training data or segmented words are not always available. Therefore, this paper proposes a training-free and segmentation-free word spotting approach that can be applied in unconstrained scenarios. The proposed word spotting framework is based on document query word expansion and relaxed feature matching algorithm, which can easily be parallelised. Since handwritten words posses distinct shape and characteristics, this work uses a combination of different keypoint detectors and Fourier-based descriptors to obtain a sufficient degree of relaxed matching. The effectiveness of the proposed method is empirically evaluated on well-known benchmark datasets using standard evaluation measures. The use of informative features along with query expansion significantly contributed in efficient performance of the proposed method.

[1]  Frank Lebourgeois,et al.  Towards an omnilingual word retrieval system for ancient manuscripts , 2009, Pattern Recognit..

[2]  Anders Hast,et al.  Radial Line Fourier Descriptor for Historical Handwritten Text Representation , 2019 .

[3]  Gernot A. Fink,et al.  PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[4]  Josep Lladós,et al.  Efficient segmentation-free keyword spotting in historical document collections , 2015, Pattern Recognit..

[5]  Ernest Valveny,et al.  Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Konstantinos Zagoris,et al.  Unsupervised Word Spotting in Historical Handwritten Document Images Using Document-Oriented Local Features , 2017, IEEE Transactions on Image Processing.

[7]  Anders Hast,et al.  Automatic Document Image Binarization using Bayesian Optimization , 2017, HIP@ICDAR.

[8]  C. V. Jawahar,et al.  Improving CNN-RNN Hybrid Networks for Handwriting Recognition , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[9]  Ernest Valveny,et al.  Segmentation-free word spotting with exemplar SVMs , 2014, Pattern Recognit..

[10]  R. Manmatha,et al.  Word spotting for historical documents , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[11]  José A. Rodríguez-Serrano,et al.  A Model-Based Sequence Similarity with Application to Handwritten Word Spotting , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Núria Cirera,et al.  BH2M: The Barcelona Historical, Handwritten Marriages Database , 2014, 2014 22nd International Conference on Pattern Recognition.

[13]  Gernot A. Fink,et al.  Segmentation-free query-by-string word spotting with Bag-of-Features HMMs , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[14]  Hermann Ney,et al.  Handwriting Recognition with Large Multidimensional Long Short-Term Memory Recurrent Neural Networks , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[15]  Basilios Gatos,et al.  A survey of document image word spotting techniques , 2017, Pattern Recognit..

[16]  José A. Rodríguez-Serrano,et al.  Handwritten word-spotting using hidden Markov models and universal vocabularies , 2009, Pattern Recognit..

[17]  Nicholas R. Howe,et al.  Part-Structured Inkball Models for One-Shot Handwritten Word Spotting , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[18]  Alejandro Héctor Toselli,et al.  ICDAR2015 Competition on Keyword Spotting for Handwritten Documents , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[19]  Anders Hast,et al.  A Fast Fourier based Feature Descriptor and a Cascade Nearest Neighbour Search with an Efficient Matching Pipeline for Mosaicing of Microscopy Images , 2018 .

[20]  Alicia Fornés,et al.  A Coarse-to-Fine Word Spotting Approach for Historical Handwritten Documents Based on Graph Embedding and Graph Edit Distance , 2014, 2014 22nd International Conference on Pattern Recognition.

[21]  Alicia Fornés,et al.  A Segmentation-Free Handwritten Word Spotting Approach by Relaxed Feature Matching , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[22]  Volkmar Frinken,et al.  A Novel Word Spotting Method Based on Recurrent Neural Networks , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Konstantinos Zagoris,et al.  ICFHR2016 Handwritten Keyword Spotting Competition (H-KWS 2016) , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[24]  Ernest Valveny,et al.  A Coarse-to-Fine Approach for Handwritten Word Spotting in Large Scale Historical Documents Collection , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[25]  Ernest Valveny,et al.  Efficient Exemplar Word Spotting , 2012, BMVC.

[26]  C. V. Jawahar,et al.  Word Spotting and Recognition Using Deep Embedding , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).