Radial Line Fourier Descriptor for Segmentation-free Handwritten Word Spotting

Automatic recognition of historical handwritten manuscripts is a daunting task due to paper degradation over time. Recognition-free retrieval or word spotting is popularly used for information retrieval and digitization of the historical handwritten documents. However, the performance of word spotting algorithms depends heavily on feature detection and representation methods. Although there exist popular feature descriptors such as Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF), the invariant properties of these descriptors amplify the noise in the degraded document images, rendering them more sensitive to noise and complex characteristics of historical manuscripts. Therefore, an efficient and relaxed feature descriptor is required as the handwritten words across different documents are indeed similar, but not identical. This paper introduces a Radial Line Fourier (RLF) descriptor for handwritten word representation, with a short feature vector of 32 dimensions. A segmentation-free and training-free handwritten word spotting method is studied herein that relies on the proposed Radial Line Fourier (RLF) descriptor, taking into account different keypoints representations and using a simple preconditioner-based feature matching algorithm. The effectiveness of the proposed RLF descriptor for segmentation-free handwritten word spotting is empirically evaluated on well-known historical handwritten datasets using standard evaluation measures.

[1]  Yuzuru Tanaka,et al.  Slit Style HOG Feature for Document Image Word Spotting , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[2]  TuytelaarsTinne,et al.  Local invariant feature detectors , 2008 .

[3]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[4]  A. Yuille,et al.  Dense Scale Invariant Descriptors for Images and Surfaces , 2012 .

[5]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Andrea Marchetti,et al.  An Efficient Preconditioner and a Modified RANSAC for Fast and Robust Feature Matching. , 2012 .

[7]  Lior Wolf,et al.  A Simple and Fast Word Spotting Method , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[8]  Adrien Bartoli,et al.  KAZE Features , 2012, ECCV.

[9]  Gustavo Carneiro,et al.  Phase-Based Local Features , 2002, ECCV.

[10]  Frank Lebourgeois,et al.  Towards an omnilingual word retrieval system for ancient manuscripts , 2009, Pattern Recognit..

[11]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Harish Srinivasan,et al.  Handwritten Arabic Word Spotting using the CEDARABIC Document Analysis System , 2005 .

[13]  Adrien Bartoli,et al.  Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces , 2013, BMVC.

[14]  Richard M. Davis,et al.  tranScriptorium: a european project on handwritten text recognition , 2013, ACM Symposium on Document Engineering.

[15]  Basilios Gatos,et al.  A survey of document image word spotting techniques , 2017, Pattern Recognit..

[16]  Gustavo Carneiro,et al.  Multi-scale phase-based local features , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[17]  José A. Rodríguez-Serrano,et al.  Handwritten word-spotting using hidden Markov models and universal vocabularies , 2009, Pattern Recognit..

[18]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Josep Lladós,et al.  Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method , 2011, 2011 International Conference on Document Analysis and Recognition.

[20]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[21]  Alicia Fornés,et al.  A Segmentation-Free Handwritten Word Spotting Approach by Relaxed Feature Matching , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[22]  Josep Lladós,et al.  A performance evaluation protocol for symbol spotting systems in terms of recognition and location indices , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[23]  Alicia Fornés,et al.  A Coarse-to-Fine Word Spotting Approach for Historical Handwritten Documents Based on Graph Embedding and Graph Edit Distance , 2014, 2014 22nd International Conference on Pattern Recognition.

[24]  Konstantinos Zagoris,et al.  ICFHR 2014 Competition on Handwritten Keyword Spotting (H-KWS 2014) , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[25]  A. Papandreou,et al.  Slant estimation and core-region detection for handwritten Latin words , 2014, Pattern Recognit. Lett..

[26]  Iasonas Kokkinos,et al.  Dense Segmentation-Aware Descriptors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Anders Hast Robust and Invariant Phase Based Local Feature Matching , 2014, 2014 22nd International Conference on Pattern Recognition.

[28]  Ioannis Pratikakis,et al.  Segmentation-free Word Spotting in Historical Printed Documents , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[29]  S. M. Steve SUSAN - a new approach to low level image processing , 1997 .

[30]  Cordelia Schmid,et al.  Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[31]  Anders Hast,et al.  Incremental Spherical Linear Interpolation , 2004 .

[32]  Iasonas Kokkinos,et al.  Scale invariance without scale selection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  B. S. Manjunath,et al.  A Mathematical Comparison of Point Detectors , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[34]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[35]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Nicholas R. Howe,et al.  Part-Structured Inkball Models for One-Shot Handwritten Word Spotting , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[37]  Ethan Rublee,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[38]  Alicia Fornés,et al.  Handwritten Word Spotting in Old Manuscript Images Using a Pseudo-structural Descriptor Organized in a Hash Structure , 2011, IbPRIA.

[39]  Frank Lebourgeois,et al.  Text search for medieval manuscript images , 2007, Pattern Recognit..

[40]  Andrea Marchetti,et al.  Putative Match Analysis - A Repeatable Alternative to RANSAC for Matching of Aerial Images , 2012, VISAPP.

[41]  Konstantinos Zagoris,et al.  Unsupervised Word Spotting in Historical Handwritten Document Images Using Document-Oriented Local Features , 2017, IEEE Transactions on Image Processing.

[42]  Edwin R. Hancock,et al.  A statistical approach to sparse multi-scale phase-based stereo , 2007, Pattern Recognit..

[43]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[44]  Alicia Forn BH2M: the Barcelona Historical Handwritten Marriages database , 2014 .

[45]  Alejandro Héctor Toselli,et al.  ICDAR2015 Competition on Keyword Spotting for Handwritten Documents , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[46]  Tobias Höllerer,et al.  Evaluation of Interest Point Detectors and Feature Descriptors for Visual Tracking , 2011, International Journal of Computer Vision.

[47]  Javier Iparraguirre,et al.  Speeded-up robust features (SURF) as a benchmark for heterogeneous computers , 2014, 2014 IEEE Biennial Congress of Argentina (ARGENCON).

[48]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[49]  Anders Hast,et al.  Invariant Interest Point Detection Based on Variations of the Spinor Tensor , 2014, WSCG 2014.

[50]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[51]  Ernest Valveny,et al.  A Coarse-to-Fine Approach for Handwritten Word Spotting in Large Scale Historical Documents Collection , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[52]  F. Perronnin,et al.  Local gradient histogram features for word spotting in unconstrained handwritten documents , 2008 .

[53]  Ernest Valveny,et al.  Efficient Exemplar Word Spotting , 2012, BMVC.

[54]  Anders Hast,et al.  Clustering in 2D as a Fast Deterministic Alternative to RANSAC , 2015, ICML 2015.

[55]  Alicia Fornés,et al.  On the Influence of Word Representations for Handwritten Word Spotting in Historical Documents , 2012, Int. J. Pattern Recognit. Artif. Intell..

[56]  Alicia Fornés,et al.  The ESPOSALLES database: An ancient marriage license corpus for off-line handwriting recognition , 2013, Pattern Recognit..

[57]  Anders Hast,et al.  Automatic Document Image Binarization using Bayesian Optimization , 2017, HIP@ICDAR.