A multiple instances approach to improving keyword spotting on historical Mongolian document images

For keyword spotting of historical Mongolian document images, when user provides different instance image for the same query keyword, the performance will vary a lot. This paper proposed an approach to solving the above problem. Particularly, the whole procedure of keyword spotting is divided into two stages. The main task of the first stage is to generate multiple ranking lists for a query keyword. And the aim of the second stage is to merge the multiple ranking lists to form a final ranking. In the first stage, the ranking list of one query keyword is firstly returned by traditional image matching and then a number of instances for the query keyword are obtained using pseudo relevant feedback. Next, each instance of the query keyword can return the corresponding ranking list separately. In the second stage, the multiple ranking lists from the multiple instances of the query keyword are combined by the data fusion technique. The final ranking will be taken as the retrieval results of the query keyword. The experimental results show that the proposed approach can significantly improve the performance of keyword spotting for the historical Mongolian document images.

[1]  Basilios Gatos,et al.  Efficient Word Retrieval Using a Multiple Ranking Combination Scheme , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[2]  Rabia Nuray-Turan,et al.  Automatic ranking of information retrieval systems using data fusion , 2006, Inf. Process. Manag..

[3]  Sergios Theodoridis,et al.  Keyword-guided word spotting in historical printed documents using synthetic data and user feedback , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[4]  R. Manmatha,et al.  Word spotting for historical documents , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[5]  Guanglai Gao,et al.  Word Spotting Application in Historical Mongolian Document Images , 2013, ICIC.

[6]  Shengli Wu,et al.  Linear combination of component results in information retrieval , 2012, Data Knowl. Eng..

[7]  Alicia Fornés,et al.  A keyword spotting approach using blurred shape model-based descriptors , 2011, HIP '11.

[8]  R. Manmatha,et al.  A search engine for historical manuscript images , 2004, SIGIR '04.

[9]  Nikos Papamarkos,et al.  A Document Image Retrieval System , 2010, Eng. Appl. Artif. Intell..

[10]  Guanglai Gao,et al.  A Method for Removing Inflectional Suffixes in Word Spotting of Mongolian Kanjur , 2011, 2011 International Conference on Document Analysis and Recognition.

[11]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[12]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[13]  Guanglai Gao,et al.  A keyword retrieval system for historical Mongolian document images , 2013, International Journal on Document Analysis and Recognition (IJDAR).

[14]  Shengli Wu,et al.  Evaluating Score Normalization Methods in Data Fusion , 2006, AIRS.

[15]  Stéphane Bres,et al.  Indexation of Syriac manuscripts using directional features , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).