A two level knowledge approach for understanding documents of a multi-class domain

In this paper an architecture for understanding documents of a domain that can be grouped into classes is shown. Documents are grouped with respect to the physical structure. The architecture is based on two knowledge descriptions of the domain: one is independent from the classes and one related to the classes. Such knowledge levels are used to understand the documents of the domain. The understanding phase is described in relation with the phases of analysis and classification of such documents.

[1]  Donato Malerba,et al.  Multistrategy Learning for Document Recognition , 1994, Appl. Artif. Intell..

[2]  Toyohide Watanabe,et al.  Layout Recognition of Multi-Kinds of Table-Form Documents , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Karl-Hans Bläsius,et al.  Knowledge-based document analysis , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[4]  Francesca Cesarini,et al.  Rectangle labelling for an invoice understanding system , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[5]  Francesca Cesarini,et al.  INFORMys: A Flexible Invoice-Like Form-Reader System , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Thomas Kieninger,et al.  Table Recognition and Labeling Using Intrinsic Layout Features , 1999 .

[7]  Francesca Cesarini,et al.  Structured document segmentation and representation by the modified X-Y tree , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).