Construction of generic models of document structures using inference of tree grammars

The use of generic model for a document class as the knowledge base in a Document Analysis System facilitates the analysis and understanding of documents belonging to this class. Nevertheless, absence of tools permitting the acquisition of this type of model is an hindrance to the conception of entirely automatic systems. In this paper, we present a method for acquiring the generic model for a document class from document samples belonging to this class. Our method is based on Inference of Tree Grammars and combination of ODA-like generic constructors. The method constructs specific physical structure for each sample and invites the user to assign logical labels to its components. From these logically labeled specific structures, it generates and modifies the generic model for the class under treatment.

[1]  Michael G. Thomason,et al.  Syntactic Pattern Recognition, An Introduction , 1978, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Donato Malerba,et al.  Automated acquisition of rules for document understanding , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[3]  Andreas Dengel,et al.  Initial learning of document structure , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[4]  Noboru Babaguchi,et al.  Incremental acquisition of knowledge about layout structures from examples of documents , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[5]  Abdel Belaïd,et al.  A labeling approach for mixed document blocks , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[6]  Yannick Chenevoy Reconnaissance structurelle de documents imprimés : études et réalisations , 1992 .

[7]  Abdel Belaïd,et al.  Knowledge-Based System for Structured Document Recognition , 1990, MVA.

[8]  Sargur N. Srihari,et al.  Recognizing Address Blocks on Mail Pieces: Specialized Tools and Problem-Solving Architecture , 1987, AI Mag..

[9]  Saul Corn,et al.  Explicit Definitions and Linguistic Dominoes , 1967 .