There is a significant and growing need to convert documents from printed paper to an electronic form. Document image analysis is concerned with the segmentation of the document image into regions of interest, their description, and the classification of the regions according to the type of their contents. A new unified approach to page segmentation and classification, based on the description of the background with tiles, is presented. The segmentation method is flexible to successfully analyse and describe regions in complicated layouts where other methods fail. Images with severe skew are handled equally well with no additional computations. The classification is based on textural features which are derived by simple calculations from the representation of space in the regions, produced during the segmentation process. This is a considerable advantage over previous methods where extra image accesses and lengthy computations are necessary. Overall, the whole approach of segmentation and classification by white tiles is fast and efficient as no time-consuming processes are required.
[1]
Tim Ritchings,et al.
Flexible page segmentation using the background
,
1994,
Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).
[2]
George Nagy,et al.
DOCUMENT ANALYSIS WITH AN EXPERT SYSTEM
,
1986
.
[3]
Tim Ritchings,et al.
Representation and classification of complex-shaped printed regions using white tiles
,
1995,
Proceedings of 3rd International Conference on Document Analysis and Recognition.
[4]
Jiangying Zhou,et al.
Page segmentation and classification
,
1992,
CVGIP Graph. Model. Image Process..
[5]
Lawrence O'Gorman,et al.
Document Image Analysis
,
1996
.