Unsupervised word spotting using a graph representation based on invariants

We are currently working on the concept of an interactive word retrieval system for ancient document collection navigation, based on query composition for non-expert users. We have introduced a new notion: invariants, which are writing pieces automatically extracted from the old document collection. The invariants can be used in query making process in where the user selects and composes appropriate invariants to make the query. The invariants can be also used as descriptor to characterize word images. We introduced our unsupervised method for extracting invariants in our earlier paper. In this paper, we present a new structural word spotting system using a graph representation based on invariants as a descriptor. Through experiments, we conclude that our proposed system can adapt to different types of homogenous alphabetic languages documents (regardless of language/script, antiquity, handwritten or printed).

[1]  Josep Lladós,et al.  Boosting the handwritten word spotting experience by including the user in the loop , 2014, Pattern Recognit..

[2]  Muriel Visani,et al.  Invariants Extraction Method Applied in an Omni-language Old Document Navigating System , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[3]  Kaspar Riesen,et al.  Approximate graph edit distance computation by means of bipartite graph matching , 2009, Image Vis. Comput..

[4]  Giovanni Soda,et al.  Font adaptive word indexing of modern printed documents , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[6]  Josep Lladós,et al.  Integrating Visual and Textual Cues for Query-by-String Word Spotting , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[7]  Volkmar Frinken,et al.  A Novel Word Spotting Method Based on Recurrent Neural Networks , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Frank Lebourgeois,et al.  Towards an omnilingual word retrieval system for ancient manuscripts , 2009, Pattern Recognit..

[9]  Muriel Visani,et al.  Semi-synthetic Document Image Generation Using Texture Mapping on Scanned 3D Document Shapes , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[10]  Alicia Fornés,et al.  A Novel Learning-Free Word Spotting Approach Based on Graph Representation , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[11]  Andreas Keller,et al.  Lexicon-free handwritten word spotting using character HMMs , 2012, Pattern Recognit. Lett..

[12]  Sergios Theodoridis,et al.  Keyword-guided word spotting in historical printed documents using synthetic data and user feedback , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[13]  Edward M. Riseman,et al.  Word spotting: a new approach to indexing handwriting , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.