Multiresolution morphological analysis of document images

An image-based approach to document image analysis is presented, that uses shape and textural properties interchangeably at multiple scales. Image-based techniques permit a relatively small number of simple and fast operations to be used for a wide variety of analysis problems with document images. The primary binary image operations are morphological and multiresolution. The generalized opening, a morphological operation, allows extraction of image features that have both shape and textural properties, and that are not limited by properties related to image connectivity. Reduction operations are necessary due to the large number of pixels at scanning resolution, and threshold reduction is used for efficient and controllable shape and texture transformations between resolution levels. Aspects of these techniques, which include sequences of threshold reductions, are illustrated by problems such as text/halftone segmentation and word-level extraction. Both the generalized opening and these multiresolution operations are then used to identify italic and bold words in text. These operations are performed without any attempt at identification of individual characters. Their robustness derives from the aggregation of statistical properties over entire words. However, the analysis of the statistical properties is performed implicitly, in large part through nonlinear image processing operations. The approximate computational cost of the basic operations is given, and the importance of operating at the lowest feasable resolution is demonstrated.