Measuring Structural Similarity of Document Pages for Searching Document Image Databases

Current document management and database systems provide text search and retrieval capabilities, but generally lack the ability to utilize the documents’ logical and physical structures. This paper describes a general system for document image retrieval that is able to make use of document structure. It discusses the use of structural similarity for retrieval; it defines a measure of structural similarity between document images based on content area overlap, and also compares similarity ratings based on this measure with human relevance judgments.

[1]  Azriel Rosenfeld,et al.  The Development of a General Framework for Intelligent Document Image Retrieval , 1996, DAS.

[2]  P. Herrmann,et al.  Retrieval of document images using layout knowledge , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[3]  Andreas Dengel,et al.  Initial learning of document structure , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[4]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[5]  Azriel Rosenfeld,et al.  Classification of document pages using structure-based features , 2001, International Journal on Document Analysis and Recognition.

[6]  Euripides G. M. Petrakis,et al.  Similarity Searching in Large Image DataBases , 1994 .

[7]  Suh-Yin Lee,et al.  Retrieval of similar pictures on pictorial databases , 1991, Pattern Recognit..

[8]  Hanan Samet,et al.  A map acquisition, storage, indexing, and retrieval system , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[9]  Suh-Yin Lee,et al.  Similarity retrieval of iconic image database , 1989, Pattern Recognit..

[10]  Clement T. Yu,et al.  Reasoning About Spatial Relationships in Picture Retrieval Systems , 1994, VLDB.

[11]  King-Sun Fu,et al.  An Image Understanding System Using Attributed Symbolic Representation and Inexact Graph-Matching , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Michael Bieber,et al.  Heuristic Classification of Office Documents , 1994, Int. J. Artif. Intell. Tools.