Background noise detection and cleaning in document images

A digitized binary image containing text which overlaps with background noise or some complex background images is not a perfect input to OCR systems. Most of the OCR systems can recognize only black characters on white uniform background or vice versa. Overlapping text with background noise regions can be directly opened with an appropriate structuring element to remove the background components that touch the characters. But applying such methods globally to a document image will reduce the quality of the "clean" text (i.e. text on uniform white background) and the character recognition accuracy will rapidly decrease. An efficacious and simple approach is to distinguish between the "noisy" text regions where such cleaning and enhancing overhead is needed and the "clean" text regions where an OCR device already yields good recognition results. The author focuses on the topic of detecting noise regions and presents an objective evaluation method. As an example it is used to evaluate a standard noise cleaning method in document images.

[1]  Peter E. Hart,et al.  Image continuation , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[2]  Eberhard Mandler,et al.  Document analysis-from pixels to contents , 1992 .

[3]  Edward R. Dougherty,et al.  Morphological methods in image and signal processing , 1988 .

[4]  Hideaki Ozawa,et al.  A character image enhancement method from characters with various background images , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[5]  Achim Weigel,et al.  Document analysis at DFKI. - Part 1: Image analysis and text recognition , 1995 .

[6]  Rainer Hoch,et al.  Document analysis at DFKI. - Part 2: Information extraction , 1995 .