On Evaluation of Segmentation-Free Word Spotting Approaches without Hard Decisions

Word spotting systems are intended to retrieve occurrences of a given keyword in document images without actually recognizing the full document content. As there is a trend towards segmentation-free word spotting methods, we propose a methodology to evaluate these methods by employing measures that take the quality of the retrieved word locations into account without making hard decisions. We derive a desired evaluation behavior with the help of synthetic examples and show discrepancies of existing evaluation methods. New measures following this behavior are introduced and their differences exemplarily described. The proposed evaluation method is applied to a state-of-the-art word spotting approach.

[1]  S. Lucas,et al.  ICDAR 2003 robust reading competitions: entries, results, and future directions , 2005, International Journal of Document Analysis and Recognition (IJDAR).

[2]  Josep Lladós,et al.  A performance evaluation protocol for symbol spotting systems in terms of recognition and location indices , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[3]  C. V. Jawahar,et al.  Character n-Gram Spotting in Document Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[4]  Andreas Keller,et al.  Lexicon-free handwritten word spotting using character HMMs , 2012, Pattern Recognit. Lett..

[5]  Sargur N. Srihari,et al.  Segmentation-Based And Segmentation-Free Methods for Spotting Handwritten Arabic Words , 2006 .

[6]  Edward M. Riseman,et al.  Word spotting: a new approach to indexing handwriting , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Frank Lebourgeois,et al.  Text search for medieval manuscript images , 2007, Pattern Recognit..

[8]  Frank Lebourgeois,et al.  Towards an omnilingual word retrieval system for ancient manuscripts , 2009, Pattern Recognit..

[9]  Venu Govindaraju,et al.  The Role of Holistic Paradigms in Handwritten Word Recognition , 2009 .

[10]  Ioannis Pratikakis,et al.  Segmentation-free Word Spotting in Historical Printed Documents , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[11]  F. Perronnin,et al.  Local gradient histogram features for word spotting in unconstrained handwritten documents , 2008 .

[12]  Yuzuru Tanaka,et al.  Automatic Evaluation Framework for Word Spotting , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[13]  Jihad El-Sana,et al.  Keyword Searching for Arabic Handwritten Documents , 2008 .

[14]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[15]  R. Manmatha,et al.  Holistic word recognition for handwritten historical documents , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[16]  Imran Siddiqi,et al.  Towards Searchable Digital Urdu Libraries - A Word Spotting Based Retrieval Approach , 2011, 2011 International Conference on Document Analysis and Recognition.

[17]  Josep Lladós,et al.  Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method , 2011, 2011 International Conference on Document Analysis and Recognition.

[18]  R. Manmatha,et al.  A search engine for historical manuscript images , 2004, SIGIR '04.