Shape Descriptor Based Document Image Indexing and Symbol Recognition

In this paper we present a novel shape descriptor based on shape context, which in combination with hierarchical distance based hashing is used for word and graphical pattern based document image indexing and retrieval. The shape descriptor represents the relative arrangement of points sampled on the boundary of the shape of object. We also demonstrate the applicability of the novel shape descriptor for classification of characters and symbols. For indexing, we provide anew formulation for distance based hierarchical locality sensitive hashing. Experiments have yielded promising results.

[1]  Gaurav Harit,et al.  Improved geometric feature graph: a script independent representation of word images for compression, and retrieval , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[2]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[3]  Panagiotis Papapetrou,et al.  Nearest Neighbor Retrieval Using Distance-Based Hashing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[4]  Su Yang,et al.  Rotation Invariant Shape Contexts based on Feature-space Fourier Transformation , 2007, Fourth International Conference on Image and Graphics (ICIG 2007).

[5]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[6]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.