Radial Line Fourier Descriptor for Historical Handwritten Text Representation

Automatic recognition of historical handwritten manuscripts is a daunting task due to paper degradation over time. Recognition-free retrieval or word spotting is popularly used for information retrieval and digitization of the historical handwritten documents. However, the performance of word spotting algorithms depends heavily on feature detection and representation methods. Although there exist popular feature descriptors such as Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF), the invariant properties of these descriptors amplify the noise in the degraded document images, rendering them more sensitive to noise and complex characteristics of historical manuscripts. Therefore, an efficient and relaxed feature descriptor is required as handwritten words across different documents are indeed similar, but not identical. This paper introduces a Radial Line Fourier (RLF) descriptor for handwritten word representation, with a short feature vector of 32 dimensions. A segmentation-free and training-free handwritten word spotting method is studied herein that relies on the proposed RLF descriptor, takes into account different keypoint representations and uses a simple preconditioner-based feature matching algorithm. The effectiveness of the RLF descriptor for segmentation-free handwritten word spotting is empirically evaluated on well-known historical handwritten datasets using standard evaluation measures.

[1]  Alicia Fornés,et al.  A Coarse-to-Fine Word Spotting Approach for Historical Handwritten Documents Based on Graph Embedding and Graph Edit Distance , 2014, 2014 22nd International Conference on Pattern Recognition.

[2]  Alicia Fornés,et al.  Handwritten Word Spotting in Old Manuscript Images Using a Pseudo-structural Descriptor Organized in a Hash Structure , 2011, IbPRIA.

[3]  Alejandro Héctor Toselli,et al.  ICDAR2015 Competition on Keyword Spotting for Handwritten Documents , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[4]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Anders Hast,et al.  Invariant Interest Point Detection Based on Variations of the Spinor Tensor , 2014, WSCG 2014.

[6]  Anders Hast Robust and Invariant Phase Based Local Feature Matching , 2014, 2014 22nd International Conference on Pattern Recognition.

[7]  Frank Lebourgeois,et al.  Towards an omnilingual word retrieval system for ancient manuscripts , 2009, Pattern Recognit..

[8]  Andrea Marchetti,et al.  An Efficient Preconditioner and a Modified RANSAC for Fast and Robust Feature Matching. , 2012 .

[9]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[10]  Frank Lebourgeois,et al.  Text search for medieval manuscript images , 2007, Pattern Recognit..

[11]  Harish Srinivasan,et al.  Handwritten Arabic Word Spotting using the CEDARABIC Document Analysis System , 2005 .

[12]  F. Perronnin,et al.  Local gradient histogram features for word spotting in unconstrained handwritten documents , 2008 .

[13]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[14]  Anders Hast,et al.  Automatic Document Image Binarization using Bayesian Optimization , 2017, HIP@ICDAR.

[15]  Konstantinos Zagoris,et al.  ICFHR 2014 Competition on Handwritten Keyword Spotting (H-KWS 2014) , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[16]  Alicia Fornés,et al.  On the Influence of Word Representations for Handwritten Word Spotting in Historical Documents , 2012, Int. J. Pattern Recognit. Artif. Intell..

[17]  Alicia Fornés,et al.  A Segmentation-Free Handwritten Word Spotting Approach by Relaxed Feature Matching , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[18]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Josep Lladós,et al.  Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method , 2011, 2011 International Conference on Document Analysis and Recognition.

[20]  Ioannis Pratikakis,et al.  Segmentation-free Word Spotting in Historical Printed Documents , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[21]  Basilios Gatos,et al.  A survey of document image word spotting techniques , 2017, Pattern Recognit..

[22]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[23]  Lior Wolf,et al.  A Simple and Fast Word Spotting Method , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[24]  Ernest Valveny,et al.  Efficient Exemplar Word Spotting , 2012, BMVC.

[25]  Ernest Valveny,et al.  A Coarse-to-Fine Approach for Handwritten Word Spotting in Large Scale Historical Documents Collection , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[26]  Andrea Marchetti,et al.  Putative Match Analysis - A Repeatable Alternative to RANSAC for Matching of Aerial Images , 2012, VISAPP.

[27]  Adrien Bartoli,et al.  KAZE Features , 2012, ECCV.

[28]  Ethan Rublee,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[29]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[30]  Nicholas R. Howe,et al.  Part-Structured Inkball Models for One-Shot Handwritten Word Spotting , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[31]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Núria Cirera,et al.  BH2M: The Barcelona Historical, Handwritten Marriages Database , 2014, 2014 22nd International Conference on Pattern Recognition.

[33]  Tobias Höllerer,et al.  Evaluation of Interest Point Detectors and Feature Descriptors for Visual Tracking , 2011, International Journal of Computer Vision.

[34]  Iasonas Kokkinos,et al.  Dense Segmentation-Aware Descriptors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[37]  Iasonas Kokkinos,et al.  Scale invariance without scale selection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Gustavo Carneiro,et al.  Phase-Based Local Features , 2002, ECCV.

[39]  Konstantinos Zagoris,et al.  Unsupervised Word Spotting in Historical Handwritten Document Images Using Document-Oriented Local Features , 2017, IEEE Transactions on Image Processing.

[40]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[41]  José A. Rodríguez-Serrano,et al.  Handwritten word-spotting using hidden Markov models and universal vocabularies , 2009, Pattern Recognit..