Computer understanding of document structure

We describe a system which is capable of learning the presentation of document logical structure, exemplary as shown for business letters. Presenting a set of instances to the system, it clusters them into structural concepts and induces a concept hierarchy. This concept hierarchy is taken as a reference for classifying future input. The article introduces the sequence of learning steps and describes how the resulting concept hierarchy is applied to logical labeling, and reports the results. © 1996 John Wiley & Sons, Inc.