The recognition of the structure of a document is to discriminate the layout structure, i.e., the two-dimensional configuration and format, of the document, and to identify the individual item data. Most of the studies of this kind so far, however, are based on the paradigm for the document structure discrimination, where the information concerning the document structure is defined beforehand for a particular type of document and is utilized as the knowledge-base. Such a paradigm is successful in recognizing the same document structure or document structure of the same kind, but is not applicable to the case where various kinds of document structures are mixed.
This paper addresses table-form documents as the objects of processing, and reports on a method which can recognize the document structures for various kinds of table-form documents. Various classes of table-form documents with various configurations and contents are available according to its use and adjacent relationship between item fields. To recognize exactly the document structure for various kinds of table-form documents, it is essential to develop the processing method based on the information for each class of table-form documents. For this purpose, the classification tree is used, which hierarchically manages the information for each case of table-form documents.
A structure recognition system for multiple kinds of table-form documents, is realized with this framework, including the recognition of table-form document class, the automatic acquisition of layout structure information and the recognition of document structure.
[1]
George Nagy,et al.
HIERARCHICAL REPRESENTATION OF OPTICALLY SCANNED DOCUMENTS
,
1984
.
[2]
Haruhiko Kojima,et al.
Table recognition for automated document entry system
,
1991,
Other Conferences.
[3]
T. Yoshida,et al.
A stepwise recognition method of library cataloging cards on the basis of various kinds of knowledge
,
1991,
[1991 Proceedings] Tenth Annual International Phoenix Conference on Computers and Communications.
[4]
Toyohide Watanabe,et al.
Structure Recognition of Table-Form Documents on the Basis of the Automatic Acqusition of Layout Knowledge
,
1992,
MVA.
[5]
Donato Malerba,et al.
An experimental page layout recognition system for office document automatic classification: an integrated approach for inductive generalization
,
1990,
[1990] Proceedings. 10th International Conference on Pattern Recognition.
[6]
Sargur N. Srihari,et al.
A Rule-Based System for Document Understanding
,
1986,
AAAI.
[7]
Noboru Babaguchi,et al.
Model Based Understanding of Document Images
,
1990,
MVA.