Image Based Word Retrieval Method for Unrestricted Textline Direction Documents

It is very important to perform a full-text retrieval search of document information accumulated in the past. Although the retrieval technologies for ascii text documents have been established, the highly precise character retrieval from the image based documents such as a bitmap image is not easy. In this paper, a word retrieval technique for a bitmap Japanese document image described with various layouts is proposed. The technique consists of character sequence extraction stage and word retrieval stage. As a result of the experiment using actual documents in of vertical writing and lateral writing mixture, it will be shown that the proposed technique is effective.

[1]  S. M. Hardingy,et al.  An Evaluation of Information Retrieval Accuracy with Simulated Ocr Output , 1992 .

[2]  Hong Zhao,et al.  Content-based indexing and retrieval method of Chinese document images , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[3]  Edward A. Fox,et al.  Digital libraries , 1995, CACM.

[4]  Dan S. Bloomberg,et al.  Word spotting in scanned images using hidden Markov models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Yue Lu,et al.  Word spotting in Chinese document images without layout analysis , 2002, Object recognition supported by user interaction for service robots.

[6]  W. Bruce Croft,et al.  Probabilistic Retrieval of OCR Degraded Text Using N-Grams , 1997, ECDL.

[7]  Chew Lim Tan,et al.  Imaged Document Text Retrieval Without OCR , 2002, IEEE Trans. Pattern Anal. Mach. Intell..