Granulometric analysis of document images

We report on new form of multivariate granulometries based on rectangles of varying size and aspect ratio. These granulometries are used for describing visual similarity between document images. Rectangular granulometries are used to probe the layout structure of document images, and the rectangular size distributions derived from them are used as descriptors for each image. Feature selection is used to reduce the dimensionality and redundancy of the size distributions, while preserving the essence of the visual appearance of a document. Experimental results indicate that rectangular size distributions are an effective way to characterize visual similarity of document images, and provide insightful interpretation of classification results in the original image space.

[1]  Petros Maragos,et al.  Pattern Spectrum and Multiscale Shape Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  G. Matheron Random Sets and Integral Geometry , 1976 .

[3]  Edward R. Dougherty,et al.  Heterogeneous morphological granulometries , 2000, Pattern Recognit..

[4]  Jeff B. Pelz,et al.  Morphological image segmentation by local granulometric size distributions , 1992, J. Electronic Imaging.

[5]  Robert M. Haralick,et al.  Model-based morphology: the opening spectrum , 1995, CVGIP Graph. Model. Image Process..

[6]  Apostolos Antonacopoulos,et al.  Page Segmentation Using the Description of the Background , 1998, Comput. Vis. Image Underst..