Vector Model Based Indexing and Retrieval of Handwritten Medical Forms

A vector model based information retrieval of handwritten medical forms is presented in this paper. In order to improve the IR performance on the erroneous output of handwriting recognition (HR) systems, a variation of the vector model is made to estimate the number of occurrences of terms from word segmentation and recognition probabilities. IR Tests show that our approach outperforms the retrieval of ordinary HR text in terms of mean average precision (MAP), R-Precision, and interpolated 11-point precisions.

[1]  Il-Seok Oh,et al.  Hangul document image retrieval system using rank-based recognition , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[2]  Venu Govindaraju,et al.  Extraction of Handwritten Text from Carbon Copy Medical Form Images , 2006, Document Analysis Systems.

[3]  Faisal Farooq,et al.  Indexing and Retrieval of Degraded Handwritten Medical Forms , 2006 .

[4]  Venu Govindaraju,et al.  Probabilistic model for segmentation based word recognition with lexicon , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[5]  Eric C. Jensen,et al.  A Survey of Retrieval Strategies for OCR Text Collections , 2002 .

[6]  R. Manmatha,et al.  Word spotting for historical documents , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[7]  R. Manmatha,et al.  Boosted decision trees for word recognition in handwritten document retrieval , 2005, SIGIR '05.

[8]  R. Manmatha,et al.  A search engine for historical manuscript images , 2004, SIGIR '04.

[9]  R. Manmatha,et al.  Word spotting: indexing handwritten manuscripts , 1997 .

[10]  J. Adachi,et al.  Retrieval methods for English-text with missrecognized OCR characters , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[11]  Gyeonghwan Kim,et al.  A Lexicon Driven Approach to Handwritten Word Recognition for Real-Time Applications , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Hongyan Jing,et al.  Using Hidden Markov Modeling to Decompose Human-Written Summaries , 2002, Computational Linguistics.

[13]  Peter Schäuble,et al.  Applying probabilistic term weighting to OCR text in the case of a large alphabetic library catalogue , 1995, SIGIR '95.

[14]  Sargur N. Srihari,et al.  Word image retrieval using binary features , 2003, IS&T/SPIE Electronic Imaging.

[15]  W. B. Croft,et al.  An Evaluation of Information Retrieval Accuracy with Simulated OCR Output , 1993 .