Searching in document images: what does the appearance of a document tell us about what it means?

The document understanding problem can be informally defined as the automatic extraction of meaning from documents. In the Intelligent Sensory Information Systems group we have experimented with analyzing the visual appearance of documents in order to extract meaning. That is, we concentrate on how documents look, rather than on what they say. We motivate this approach with several applications from document image understanding. First, we describe how document genre classification can be used to group visually similar documents together, which simplifies the analysis task for an entire class of documents. Second, we consider the logical block labeling problem. We show how logical labels (e.g. title, author, header, footer) can be assigned to blocks of text using a few visual features. Third, we discuss our approach to detecting the reading order of text using the visual structure of a document. The examples are based on the work in the field of content-based image retrieval (CBIR). Content-based image retrieval aims at searching and browsing image repositories on the basis of a visual specification of the query. The query may be one or preferably more examples, and the presentation may be a linear list of items, or prefarably a similarity grouping. Our research on colour representations of real world objects and colour composition of documents has shown that CBIR techniques can be successfully applied in order to simplify the document understanding problem.

[1]  Marcel Worring,et al.  Interaction in Content-Based Image Retrieval: The Evaluation of the State-of-the-Art Review , 2000, VISUAL.

[2]  Arnold W. M. Smeulders,et al.  Exhaustive Orientation Scale-space Computation - A robust approach to curvilinear structure detection , 2001 .

[3]  Marco Aiello,et al.  Combining linguistic and spatial information for document analysis , 2000, RIAO.

[4]  Marcel Worring,et al.  Fine-grained document genre classification using first order random graphs , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[5]  Marcel Worring,et al.  Segmentation of color documents by line oriented clustering using spatial information , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[6]  Marcel Worring,et al.  Searching for images in biomedical publications , 2001 .

[7]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Arnold W. M. Smeulders,et al.  PicToSeek: combining color and shape invariant features for image retrieval , 2000, IEEE Trans. Image Process..

[9]  Marcel Worring,et al.  First order Gaussian graphs for efficient structure classification , 2003, Pattern Recognit..

[10]  Marcel Worring,et al.  Interactive Retrieval of Color Images , 2001, Int. J. Image Graph..

[11]  Marcel Worring,et al.  Logical structure detection for heterogeneous document classes , 2000, IS&T/SPIE Electronic Imaging.

[12]  Miley W. Merkhofer,et al.  An Evaluation of the State of the Art , 1993 .