A Fast Keyword-Spotting Technique

In order to capture the content of an imaged document but avoid the time-consuming full-scale OCR which is fragile to handle touching characters, a fast and segmentation- free keyword spotting method is proposed in this paper. The keyword spotting method is based on word shape coding technique. The proposed coding scheme has little ambiguity, and can be swiftly executed. It is a promising technique to boost better document image retrieval. The strength of the proposed method is demonstrated in a document filtering experiment. The experimental results show that document filtering based on the proposed method is more than 20 times faster than the one based on OCR, and has comparable filtering accuracy.

[1]  David S. Doermann,et al.  The Indexing and Retrieval of Document Images: A Survey , 1998, Comput. Vis. Image Underst..

[2]  Mandar Mitra,et al.  Information Retrieval from Documents: A Survey , 2000, Information Retrieval.

[3]  Dan S. Bloomberg,et al.  Detecting and locating partially specified keywords in scanned images using hidden Markov models , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[4]  Chew Lim Tan,et al.  Text Retrieval from Document Images Based on Word Shape Analysis , 2004, Applied Intelligence.

[5]  Chew Lim Tan,et al.  Information Retrieval in Document Image Databases , 2004, IEEE Trans. Knowl. Data Eng..

[6]  Dan S. Bloomberg,et al.  Measuring document image skew and orientation , 1995, Electronic Imaging.