A Segmentation-Free Handwritten Word Spotting Approach by Relaxed Feature Matching

The automatic recognition of historical handwritten documents is still considered a challenging task. For this reason, word spotting emerges as a good alternative for making the information contained in these documents available to the user. Word spotting is defined as the task of retrieving all instances of the query word in a document collection, becoming a useful tool for information retrieval. In this paper we propose a segmentation-free word spotting approach able to deal with large document collections. Our method is inspired on feature matching algorithms that have been applied to image matching and retrieval. Since handwritten words have different shape, there is no exact transformation to be obtained. However, the sufficient degree of relaxation is achieved by using a Fourier based descriptor and an alternative approach to RANSAC called PUMA. The proposed approach is evaluated on historical marriage records, achieving promising results.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Nicholas R. Howe,et al.  Part-Structured Inkball Models for One-Shot Handwritten Word Spotting , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[3]  Alicia Fornés,et al.  Handwritten Word Spotting in Old Manuscript Images Using a Pseudo-structural Descriptor Organized in a Hash Structure , 2011, IbPRIA.

[4]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[5]  Alicia Fornés,et al.  A Coarse-to-Fine Word Spotting Approach for Historical Handwritten Documents Based on Graph Embedding and Graph Edit Distance , 2014, 2014 22nd International Conference on Pattern Recognition.

[6]  Anders Hast Interest Point Detection based on the Extended Structure Tensor with a Scale Space Parameter , 2015, VISAPP.

[7]  Konstantinos Zagoris,et al.  ICFHR 2014 Competition on Handwritten Keyword Spotting (H-KWS 2014) , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[8]  Tobias Höllerer,et al.  Evaluation of Interest Point Detectors and Feature Descriptors for Visual Tracking , 2011, International Journal of Computer Vision.

[9]  Frank Lebourgeois,et al.  Towards an omnilingual word retrieval system for ancient manuscripts , 2009, Pattern Recognit..

[10]  Volkmar Frinken,et al.  A Novel Word Spotting Method Based on Recurrent Neural Networks , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Andrea Marchetti,et al.  An Efficient Preconditioner and a Modified RANSAC for Fast and Robust Feature Matching. , 2012 .

[12]  Lior Wolf,et al.  A Simple and Fast Word Spotting Method , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[13]  José A. Rodríguez-Serrano,et al.  Handwritten word-spotting using hidden Markov models and universal vocabularies , 2009, Pattern Recognit..

[14]  R. Manmatha,et al.  Word spotting for historical documents , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[15]  Josep Lladós,et al.  Efficient segmentation-free keyword spotting in historical document collections , 2015, Pattern Recognit..

[16]  Núria Cirera,et al.  BH2M: The Barcelona Historical, Handwritten Marriages Database , 2014, 2014 22nd International Conference on Pattern Recognition.

[17]  Andrea Marchetti,et al.  Putative Match Analysis - A Repeatable Alternative to RANSAC for Matching of Aerial Images , 2012, VISAPP.

[18]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[19]  Anders Hast,et al.  Invariant Interest Point Detection Based on Variations of the Spinor Tensor , 2014, WSCG 2014.

[20]  Ernest Valveny,et al.  Segmentation-free word spotting with exemplar SVMs , 2014, Pattern Recognit..

[21]  Andrea Marchetti,et al.  Rotation invariant feature matching-based on Gaussian filtered log polar transform and phase correlation , 2013, 2013 8th International Symposium on Image and Signal Processing and Analysis (ISPA).

[22]  Ernest Valveny,et al.  A Coarse-to-Fine Approach for Handwritten Word Spotting in Large Scale Historical Documents Collection , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[23]  José A. Rodríguez-Serrano,et al.  A Model-Based Sequence Similarity with Application to Handwritten Word Spotting , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Ernest Valveny,et al.  Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Anders Hast Robust and Invariant Phase Based Local Feature Matching , 2014, 2014 22nd International Conference on Pattern Recognition.

[26]  Anders Hast,et al.  Clustering in 2D as a Fast Deterministic Alternative to RANSAC , 2015, ICML 2015.

[27]  Alicia Fornés,et al.  On the Influence of Word Representations for Handwritten Word Spotting in Historical Documents , 2012, Int. J. Pattern Recognit. Artif. Intell..

[28]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..