Content features for logical document labeling

The use of content feature extracted from recognized text is valuable in labeling logical elements in documents without rigid layout structure, like business letters. This paper discusses a model-based approach to combining content features with other geometrical and presentation features for logical labeling. Models are automatically initialized and adaptively improved using training samples. Satisfactory experiment results are presented.

[1]  David S. Doermann,et al.  Page classification through logical labelling , 2002, Object recognition supported by user interaction for service robots.

[2]  T.A. Bayer,et al.  Experiments on extracting structural information from paper documents using syntactic pattern analysis , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[3]  Thomas Kieninger,et al.  Rule-based document structure understanding with a fuzzy combination of layout and textual features , 2001, International Journal on Document Analysis and Recognition.

[4]  David S. Doermann,et al.  Logical Labeling of Document Images Using Layout Graph Matching with Adaptive Learning , 2002, Document Analysis Systems.

[5]  Thomas Bayer Understanding structured text documents by a model based document analysis system , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).