ANASTASIL: A System for Low-Level and High-Level Geometric Analysis of Printed Documents

This paper focuses on the knowledge-based document analysis system ANASTASIL (Analysis System to Interpret Areas in Single-sided Letters). The system identifies important conceptual parts (logical objects) within business letters, like recipient, sender or company-specific printings. Thereby, the system works completely independent of text recognition. Instead, it only utilizes geometric knowledge sources. These are: global geometric knowledge about logical object arrangements, and local geometric knowledge about formal features of logical objects (e.g. extensions, typical font sizes, etc). As a result, a document image is classified by labeling area items by corresponding logical object designators after hypothesizing and testing geometric properties of the captured physical units (layout objects). Due to this strategy, ANASTASIL can be envisioned as a key for expectation-driven further analysis of logical objects by text or graphic recognition. The system has been completely implemented and has achieved some remarkable results. It is composed of a low-level geometric analysis module for image processing tasks and a high-level geometric analysis module that performs logical labeling of layout objects. The implementation was done on a SUN 3/60 workstation in C and Common-Lisp and will be soon available in the MacIntosh environment.

[1]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[2]  Sargur N. Srihari,et al.  Recognizing Address Blocks on Mail Pieces: Specialized Tools and Problem-Solving Architecture , 1987, AI Mag..

[3]  Andreas Dengel,et al.  High Level Document Analysis Guided by Geometric Aspects , 1988, Int. J. Pattern Recognit. Artif. Intell..

[4]  Barr and Feigenbaum Edward A. Avron,et al.  The Handbook of Artificial Intelligence , 1981 .

[5]  HorakWolfgang Office Document Architecture and Office Document Interchange Formats , 1985 .

[6]  Andreas Dengel,et al.  Integrated document management system , 1990, Defense, Security, and Sensing.

[7]  George Nagy,et al.  HIERARCHICAL REPRESENTATION OF OPTICALLY SCANNED DOCUMENTS , 1984 .

[8]  Donato Malerba,et al.  An experimental page layout recognition system for office document automatic classification: an integrated approach for inductive generalization , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[9]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..

[10]  Andreas Dengel,et al.  ANASTASIL: A Hybrid Knowledge-Based System for Document Layout Analysis , 1989, IJCAI.

[11]  Frank Hoenes,et al.  Intelligent word-based text recognition , 1991, Other Conferences.

[12]  Klaus Kreplin,et al.  Knowledge based document classification supporting integrated document handling , 1988 .

[13]  Nelson Mendonça Mattos,et al.  Abstraction Concepts: The Basis for Data and Knowledge Modeling , 1988, ER.

[14]  George Nagy,et al.  DOCUMENT ANALYSIS WITH AN EXPERT SYSTEM , 1986 .

[15]  Wolfgang Horak,et al.  Office Document Architecture and Office Document Interchange Formats: Current Status of International Standardization , 1985, Computer.