Segmentation-Based And Segmentation-Free Methods for Spotting Handwritten Arabic Words

Given a set of handwritten documents, a common goal is to search for a relevant subset. Attempting to find a query word or image in such a set of documents is called word spotting. Spotting handwritten words in documents written in the Latin alphabet, and more recently in Arabic, has received considerable attention. One issue is generating candidate word regions on a page. Attempting to definitely segment the document into such regions (automatic segmentation) can meet with some success, but the performance of such an algorithm is often a limiting factor in spotting performance. Another approach is to directly scan the image on the page without attempting to generate such a definite segmentation. A new algorithm for word spotting and a comparison of recent algorithms which act on previously unsegmented Arabic handwritten text is presented. The algorithms considered are an automated word segmentation method presented previously and a “segmentation free” algorithm which performs spotting directly on lines of unsegmented text. The segmentation free approach performs spotting and segmentation concurrently using a sliding window. The spotting method used to judge the performance of the algorithms is a character based method, but the results are independent of the actual spotting method used. The segmentation-free method performs an average of 5-10% better than the automated segmentation method, and manages to have a lower per query cost on unprocessed images. However, it has a larger per query cost on preprocessed documents.

[1]  Joshua Alspector,et al.  A Line-Oriented Approach to Word Spotting in Handwritten Documents , 2000, Pattern Analysis & Applications.

[2]  Cheng-Chang Lu,et al.  Highly efficient coding schemes for contour lines based on chain code representations , 1991, IEEE Trans. Commun..

[3]  Harish Srinivasan,et al.  Handwritten Arabic Word Spotting using the CEDARABIC Document Analysis System , 2005 .

[4]  A segmentation and recognition strategy for handwritten phrases , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[5]  Sargur N. Srihari,et al.  Spotting words in handwritten Arabic documents , 2006, Electronic Imaging.

[6]  Herbert Freeman,et al.  Computer Processing of Line-Drawing Images , 1974, CSUR.

[7]  R. Manmatha,et al.  Indexing of Handwritten Historical Documents - Recent Progress , 2003 .

[8]  Oscar E. Agazzi,et al.  Keyword Spotting in Poorly Printed Documents using Pseudo 2-D Hidden Markov Models , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Sargur N. Srihari,et al.  Binary Vector Dissimilarity Measures for Handwriting Identification , 2003, IS&T/SPIE Electronic Imaging.

[10]  Pietro Perona,et al.  Using hierarchical shape models to spot keywords in cursive handwriting data , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[11]  Sargur N. Srihari,et al.  Individuality of numerals , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[12]  Sargur N. Srihari,et al.  Word image retrieval using binary features , 2003, IS&T/SPIE Electronic Imaging.

[13]  Gyeonghwan Kim,et al.  A Lexicon Driven Approach to Handwritten Word Recognition for Real-Time Applications , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Venu Govindaraju,et al.  Efficient chain-code-based image manipulation for handwritten word recognition , 1996, Electronic Imaging.