论文信息 - Use of document structure analysis to retrieve information from documents in digital libraries

Use of document structure analysis to retrieve information from documents in digital libraries

This paper describes an approach to retrieving information from document images stored in a digital library by means of knowledge-based layout analysis and logical structure derivation techniques. Queries on document image content are categorized in terms of the type of information that is desired, and are parsed to determine the type of document from which information is desired, the syntactic level of the information desired, and the level of analysis required to extract the information. Using these clauses in the query, a set of salient documents are retrieved, layout analysis and logical structure derivation are performed on the retrieved documents, and the documents are then analyzed in detail to extract the relevant logical components. A 'document browser' application, being developed based on this approach, allows a user to interactively specify queries on the documents in the digital library using a graphical user interface, provides feedback about the candidate documents at each stage of the retrieval process, and allows refinements of the query based on the intermediate results of the search. Results of a query are displayed either as an image or as formatted text.

Sargur N. Srihari | Debashish Niyogi

[1] Rohini K. Srihari,et al. Piction: A System That Uses Captions to Label Human Faces in Newspaper Photographs , 1991, AAAI.

[2] Venu Govindaraju,et al. ANALYSIS OF PRINTED FORMS , 1997 .

[3] Leslie Lamport,et al. Latex : A Document Preparation System , 1985 .

[4] Sargur N. Srihari,et al. An integrated approach to document decomposition and structural analysis , 1996, Int. J. Imaging Syst. Technol..

[5] Sargur N. Srihari,et al. Knowledge-based derivation of document logical structure , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[6] Michael S. Landy,et al. HIPS: A unix-based image processing system , 1984, Comput. Vis. Graph. Image Process..

[7] Debashish Niyogi,et al. A knowledge-based approach to deriving logical structure from document images , 1995 .

[8] Martin D. Levine,et al. Low Level Image Segmentation: An Expert System , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Sargur N. Srihari,et al. Intelligent Data Retrieval from Raster Images of Documents , 1994 .

[10] Sargur N. Srihari,et al. A Rule-Based System for Document Understanding , 1986, AAAI.