Knowledge-Based System for Structured Document Recognition

This paper discribes a document analysis system broadly consisting of a knowledge base, a blackboard and a set of tasks having their own set of spacialists for segmentation, recognition and for inheritance. The knowledge base contains a generic hierarchical description of the document structure in terms of layout objects labeled logically. This allows the generation of hypothetic networks of linked objects in the blackboard. The specialists cooperate indirectly through the blackboard by updating the layout object descriptors. GRAPHEIN is a general-purpose system that could deal effectively with a variety of document classes. It is able to organize and control the diverse document recognition p r e cesses in a flexible and efficient manner. Section 2 presents the classes of document structure adopted and the knowledge sources taken into account in the GRAPHEIN project. The system architecture and the control structure will be detailed respectively in sections 3 and 4. Finally, we conclude with a discussion on the opportunity of such an architecture and propose further improvements. A blackboard modification causes an "event" to propagate up to some specific tasks. A task could then choose another 2 Document structures subset of specialists to carry on with the process. Finally, a synthesized blackboard summary allows a task selector to focus efficiently on the most useful layout object t o process.