论文信息 - Document image retrieval without OCRing using a video scanning system

Document image retrieval without OCRing using a video scanning system

In this paper, we propose a technique for efficient document retrieval from digital libraries containing document images which are token based compressed. The query image is captured from a paper document by the video scanning tool of a multimedia system. The technique we propose uses the layout information supplied by the relative positions of the character tokens on the page of a “query” paper document to retrieve the original document in the image database. This technique avoids OCRing the query document and the documents in the database; moreover avoids decompressing the token based compressed documents in the database, therefore achieving important time and computational gains.

Ercan E. Kuruoglu | Vern T. Tan

[1] Sargur N. Srihari,et al. Use of document structure analysis to retrieve information from documents in digital libraries , 1997, Electronic Imaging.

[2] David S. Doermann,et al. The Indexing and Retrieval of Document Images: A Survey , 1998, Comput. Vis. Image Underst..

[3] Daniel P. Huttenlocher,et al. Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[4] Alex S. Taylor,et al. CamWorks: a video-based tool for efficient capture from paper source documents , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[5] Alan F. Smeaton,et al. Using character shape coding for information retrieval , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[6] DocumentsAlan F. SmeatonSchool. Retrieving Images of Scanned Text Documents , 1998 .

[7] Stephen I. Gallant,et al. Image retrieval using image context vectors: first results , 1995, Electronic Imaging.

[8] Azriel Rosenfeld,et al. Symbolic Compression and Processing of Document Images , 1998, Comput. Vis. Image Underst..