Graph-Based Keyword Spotting in Historical Handwritten Documents

The amount of handwritten documents that is digitally available is rapidly increasing. However, we observe a certain lack of accessibility to these documents especially with respect to searching and browsing. This paper aims at closing this gap by means of a novel method for keyword spotting in ancient handwritten documents. The proposed system relies on a keypoint-based graph representation for individual words. Keypoints are characteristic points in a word image that are represented by nodes, while edges are employed to represent strokes between two keypoints. The basic task of keyword spotting is then conducted by a recent approximation algorithm for graph edit distance. The novel framework for graph-based keyword spotting is tested on the George Washington dataset on which a state-of-the-art reference system is clearly outperformed.

[1]  Alicia Fornés,et al.  A Novel Learning-Free Word Spotting Approach Based on Graph Representation , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[2]  Joshua Alspector,et al.  A Line-Oriented Approach to Word Spotting in Handwritten Documents , 2000, Pattern Analysis & Applications.

[3]  Jonathan J. Hull Document Image skew Detection: Survey and Annotated Bibliography , 1996, DAS.

[4]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[5]  Frank Lebourgeois,et al.  Text search for medieval manuscript images , 2007, Pattern Recognit..

[6]  Liang Huang,et al.  Keyword Spotting in Offline Chinese Handwritten Documents Using a Statistical Model , 2011, 2011 International Conference on Document Analysis and Recognition.

[7]  Alicia Fornés,et al.  Handwritten word spotting by inexact matching of grapheme graphs , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[8]  R. Manmatha,et al.  Word spotting for historical documents , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[9]  R. Manmatha,et al.  Word image matching using dynamic time warping , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[10]  Simon Thomas,et al.  A deep HMM model for multiple keywords spotting in handwritten documents , 2014, Pattern Analysis and Applications.

[11]  Mario Vento,et al.  Graph Matching and Learning in Pattern Recognition in the Last 10 Years , 2014, Int. J. Pattern Recognit. Artif. Intell..

[12]  Basilios Gatos,et al.  A segmentation-free word spotting method for historical printed documents , 2016, Pattern Analysis and Applications.

[13]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[14]  Umapada Pal,et al.  Local Binary Pattern for Word Spotting in Handwritten Historical Document , 2016, S+SSPR.

[15]  T. Koopmans,et al.  Assignment Problems and the Location of Economic Activities , 1957 .

[16]  Andreas Keller,et al.  Lexicon-free handwritten word spotting using character HMMs , 2012, Pattern Recognit. Lett..

[17]  Edward M. Riseman,et al.  Word spotting: a new approach to indexing handwriting , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Alberto Del Bimbo,et al.  Efficient Matching and Indexing of Graph Models in Content-Based Retrieval , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Muriel Visani,et al.  Unsupervised word spotting using a graph representation based on invariants , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[20]  Horst Bunke,et al.  Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System , 2001, Int. J. Pattern Recognit. Artif. Intell..

[21]  Volkmar Frinken,et al.  Approximation of graph edit distance based on Hausdorff matching , 2015, Pattern Recognit..

[22]  Kaspar Riesen,et al.  A Survey on Applications of Bipartite Graph Edit Distance , 2017, GbRPR.

[23]  Kaspar Riesen,et al.  Graph Similarity Features for HMM-Based Handwriting Recognition in Historical Documents , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[24]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[25]  Oscar E. Agazzi,et al.  Keyword Spotting in Poorly Printed Documents using Pseudo 2-D Hidden Markov Models , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Horst Bunke,et al.  Inexact graph matching for structural pattern recognition , 1983, Pattern Recognit. Lett..

[27]  Michael Stolz,et al.  Ground truth creation for handwriting recognition in historical documents , 2010, DAS '10.

[28]  José A. Rodríguez-Serrano,et al.  Handwritten word-spotting using hidden Markov models and universal vocabularies , 2009, Pattern Recognit..

[29]  Kaspar Riesen,et al.  Approximate graph edit distance computation by means of bipartite graph matching , 2009, Image Vis. Comput..

[30]  Ernest Valveny,et al.  Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Anders Brun,et al.  Semantic and Verbatim Word Spotting Using Deep Neural Networks , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[32]  Zicheng Guo,et al.  Parallel thinning with two-subiteration algorithms , 1989, Commun. ACM.

[33]  R. Manmatha,et al.  Holistic word recognition for handwritten historical documents , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[34]  Volkmar Frinken,et al.  A Novel Word Spotting Method Based on Recurrent Neural Networks , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Alicia Fornés,et al.  A Coarse-to-Fine Word Spotting Approach for Historical Handwritten Documents Based on Graph Embedding and Graph Edit Distance , 2014, 2014 22nd International Conference on Pattern Recognition.

[36]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..