Word of blobs

In this paper, we present a novel scheme for subdividing a pictorial representation of a word or word-part into a sequence of blobs, that resemble the stroke representing the word. These blobs are generated by applying a bank of Gabor filters that capture the width of the strokes in multiple directions and segment the strong response regions. From the resulting blobs we extract representative features that are combined using bag-of-features. The proposed scheme is robust; i.e., insensitive to noise, and works directly on gray scale images. It represents the handwritten curves as a sequence of elliptic blobs, whose width is similar to that of the original handwriting. We incorporated the proposed approach in word spotting procedure and evaluated its performance on Arabic handwritten datasets.

[1]  Sergios Theodoridis,et al.  Keyword-guided word spotting in historical printed documents using synthetic data and user feedback , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[2]  Robert Sablatnig,et al.  Writer Retrieval and Writer Identification Using Local Features , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[3]  David S. Doermann,et al.  Document Image Quality Assessment: A Brief Survey , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[4]  Jin Chen,et al.  Gabor features for offline Arabic handwriting recognition , 2010, DAS '10.

[5]  Venu Govindaraju,et al.  The Role of Holistic Paradigms in Handwritten Word Recognition , 2009 .

[6]  David S. Doermann,et al.  The Indexing and Retrieval of Document Images: A Survey , 1998, Comput. Vis. Image Underst..

[7]  Josep Lladós,et al.  Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method , 2011, 2011 International Conference on Document Analysis and Recognition.

[8]  N. Senthilkumaran,et al.  Image Segmentation - A Survey of Soft Computing Approaches , 2009, 2009 International Conference on Advances in Recent Technologies in Communication and Computing.

[9]  Yoshihiko Hamamoto,et al.  A gabor filter-based method for recognizing handwritten numerals , 1998, Pattern Recognit..

[10]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[12]  Volker Märgner,et al.  Writer Identification for Historical Arabic Documents , 2014, 2014 22nd International Conference on Pattern Recognition.

[13]  Changsong Liu,et al.  Gabor filters-based feature extraction for character recognition , 2005, Pattern Recognit..

[14]  Gunilla Borgefors,et al.  Hierarchical Chamfer Matching: A Parametric Edge Matching Algorithm , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Its'hak Dinstein,et al.  Adaptive shape prior for recognition and variational segmentation of degraded historical characters , 2009, Pattern Recognit..

[16]  Cheng-Lin Liu,et al.  Gabor feature extraction for character recognition: comparison with gradient feature , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[17]  Jihad El-Sana,et al.  Word spotting for handwritten documents using Chamfer Distance and Dynamic Time Warping , 2011, Electronic Imaging.

[18]  Haikal El Abed,et al.  Invariant Primitives for Handwritten Arabic Script: A Contrastive Study of Four Feature Sets , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[19]  Peiyi Shen,et al.  Extraction from Historical Handwritten Documents by Edge Detection , 2004 .

[20]  R. Manmatha,et al.  Finding words in alphabet soup: Inference on freeform character recognition for historical scripts , 2009, Pattern Recognit..

[21]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[22]  Sargur N. Srihari,et al.  Word image retrieval using binary features , 2003, IS&T/SPIE Electronic Imaging.

[23]  Ernest Valveny,et al.  Efficient Exemplar Word Spotting , 2012, BMVC.

[24]  Jihad El-Sana,et al.  Segmentation-Free Keyword Retrieval in Historical Document Images , 2014, ICIAR.

[25]  Joshua Alspector,et al.  A Line-Oriented Approach to Word Spotting in Handwritten Documents , 2000, Pattern Analysis & Applications.

[26]  Lior Wolf,et al.  A Simple and Fast Word Spotting Method , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[27]  Dorothea Blostein,et al.  A survey of document image classification: problem statement, classifier architecture and performance evaluation , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[28]  W. Bruce Croft,et al.  Word Spotting: Indexing Handwritten Archives , 1997 .

[29]  Li Bai,et al.  Circle Detection Using a Gabor Annulus , 2011, BMVC.

[30]  Joni-Kristian Kämäräinen,et al.  Invariance properties of Gabor filter-based features-overview and applications , 2006, IEEE Transactions on Image Processing.

[31]  Jihad El-Sana,et al.  Evolution Maps for Connected Components in Text Documents , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[32]  M. Pechwitz,et al.  IFN/ENIT: database of handwritten arabic words , 2002 .

[33]  A. Saalfeld Topologically Consistent Line Simplification with the Douglas-Peucker Algorithm , 1999 .

[34]  Alessandro Vinciarelli,et al.  A survey on off-line Cursive Word Recognition , 2002, Pattern Recognit..

[35]  Laurent Heutte,et al.  Spot It! Finding Words and Patterns in Historical Documents , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[36]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[37]  Mary Idicula Sumam,et al.  A Survey on Writer Identification Schemes , 2011 .

[38]  Bülent Sankur,et al.  Survey over image thresholding techniques and quantitative performance evaluation , 2004, J. Electronic Imaging.

[39]  Anil K. Jain,et al.  Text information extraction in images and video: a survey , 2004, Pattern Recognit..