Handwritten word spotting by inexact matching of grapheme graphs

This paper presents a graph-based word spotting for handwritten documents. Contrary to most word spotting techniques, which use statistical representations, we propose a structural representation suitable to be robust to the inherent deformations of handwriting. Attributed graphs are constructed using a part-based approach. Graphemes extracted from shape convexities are used as stable units of handwriting, and are associated to graph nodes. Then, spatial relations between them determine graph edges. Spotting is defined in terms of an error-tolerant graph matching using bipartite-graph matching algorithm. To make the method usable in large datasets, a graph indexing approach that makes use of binary embeddings is used as preprocessing. Historical documents are used as experimental framework. The approach is comparable to statistical ones in terms of time and memory requirements, especially when dealing with large document collections.

[1]  Alicia Fornés,et al.  On the Influence of Word Representations for Handwritten Word Spotting in Historical Documents , 2012, Int. J. Pattern Recognit. Artif. Intell..

[2]  Josep Lladós,et al.  Efficient segmentation-free keyword spotting in historical document collections , 2015, Pattern Recognit..

[3]  W. Marsden I and J , 2012 .

[4]  Kaspar Riesen,et al.  Graph Based Keyword Spotting in Medieval Slavic Documents - A Project Outline , 2014, EuroMed.

[5]  Alicia Fornés,et al.  A Novel Learning-Free Word Spotting Approach Based on Graph Representation , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[6]  Véronique Eglin,et al.  Ancient Handwritings Decomposition Into Graphemes and Codebook Generation Based on Graph Coloring , 2010, ICFHR.

[7]  RusiñolMarçal,et al.  Efficient segmentation-free keyword spotting in historical document collections , 2015 .

[8]  Volkmar Frinken,et al.  A Fast Matching Algorithm for Graph-Based Handwriting Recognition , 2013, GbRPR.

[9]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[10]  Alicia Fornés,et al.  On the Influence of Key Point Encoding for Handwritten Word Spotting , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[11]  Alicia Fornés,et al.  A Coarse-to-Fine Word Spotting Approach for Historical Handwritten Documents Based on Graph Embedding and Graph Edit Distance , 2014, 2014 22nd International Conference on Pattern Recognition.

[12]  Lambert Schomaker,et al.  Automatic writer identification using connected-component contours and edge-based features of uppercase Western script , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Xuelong Li,et al.  A survey of graph edit distance , 2010, Pattern Analysis and Applications.

[14]  Alicia Forn BH2M: the Barcelona Historical Handwritten Marriages database , 2014 .

[15]  Ernest Valveny,et al.  Deformable HOG-Based Shape Descriptor , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[16]  Ernest Valveny,et al.  Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Alicia Fornés,et al.  A graph-based approach for segmenting touching lines in historical handwritten documents , 2014, International Journal on Document Analysis and Recognition (IJDAR).

[18]  Kaspar Riesen,et al.  Approximate graph edit distance computation by means of bipartite graph matching , 2009, Image Vis. Comput..

[19]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..

[20]  Sergio Escalera,et al.  Blurred Shape Model for binary and grey-level symbol recognition , 2009, Pattern Recognit. Lett..

[21]  Alicia Fornés,et al.  Large-Scale Graph Indexing Using Binary Embeddings of Node Contexts , 2015, GbRPR.