A multiresolution approach for page segmentation

In this work we propose a new page segmentation method for recognizing text and graphics based on a multiresolution representation of the page image. Our approach is based on the analysis of a set of feature maps available at different resolution levels. The final output is a description of the physical structure of a page. A page image is broken down into several blocks which represent components of a page, such as text, line-drawings, and pictures. The result, which uses only a small amount of memory in addition to that for the image, may be the first step for a more detailed analysis such as optical character recognition.

[1]  Sargur N. Srihari,et al.  Classification of newspaper image blocks using texture analysis , 1989, Comput. Vis. Graph. Image Process..

[2]  S.C. Hinds,et al.  A rule-based system for document image segmentation , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[3]  Lawrence O'Gorman,et al.  Document Image Analysis Systems - Guest Editors' Introduction to the Special Issue , 1992, Computer.

[4]  Jiangying Zhou,et al.  Page segmentation and classification , 1992, CVGIP Graph. Model. Image Process..

[5]  Anil K. Jain,et al.  Page segmentation using tecture analysis , 1996, Pattern Recognit..

[6]  Azriel Rosenfeld,et al.  Multiresolution image processing and analysis , 1984 .

[7]  Henry S. Baird,et al.  Language-free layout analysis , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[8]  Abdel Belaïd,et al.  Page segmentation by segment tracing , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[9]  Robert M. Haralick,et al.  Document image understanding: geometric and logical layout , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Yuan Yan Tang,et al.  Automatic document processing: A survey , 1996, Pattern Recognit..

[11]  Mahesh Viswanathan,et al.  A prototype document image analysis system for technical journals , 1992, Computer.