Adaptive Layout Analysis of Document Images

Layout analysis is the process of extracting a hierarchical structure describing the layout of a page. In the document processing system WISDOM++ the layout analysis is performed in two steps: firstly, the global analysis determines possible areas containing paragraphs, sections, columns, figures and tables, and secondly, the local analysis groups together blocks that possibly fall within the same area. The result of the local analysis process strongly depends on the quality of the results of the first step. In this paper we investigate the possibility of supporting the user during the correction of the results of the global analysis. This is done by allowing the user to correct the results of the global analysis and then by learning rules for layout correction from the sequence of user actions. Experimental results on a set of multipage documents are reported.

[1]  George Nagy Document Image Analysis: What is Missing? , 1995, ICIAP.

[2]  Donato Malerba,et al.  Machine Learning for Intelligent Processing of Printed Documents , 2000, Journal of Intelligent Information Systems.

[3]  Donato Malerba,et al.  Learning Recursive Theories with ATRE , 1998, ECAI.

[4]  Andreas Dengel,et al.  Initial learning of document structure , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[5]  Abdel Belaïd,et al.  Construction of generic models of document structures using inference of tree grammars , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[6]  Andreas Dengel,et al.  Clustering and classification of document structure-a machine learning approach , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[7]  Noboru Babaguchi,et al.  Incremental acquisition of knowledge about layout structures from examples of documents , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[8]  Donato Malerba,et al.  A knowledge-based approach to the layout analysis , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[9]  Hanno Walischewski,et al.  Automatic knowledge acquisition for spatial document interpretation , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[10]  Donato Malerba,et al.  Transforming paper documents into XML format with WISDOM++ , 2001, International Journal on Document Analysis and Recognition.

[11]  Donato Malerba,et al.  Induction of Recursive Theories in the Normal ILP Setting: Issues and Solutions , 2000, ILP.