Context-based filtering of document images

Abstract Two statistical context-based filters are introduced for the enhancement of binary document images for compression and recognition. The simple context filter unconditionally changes uncommon pixels in low information contexts, whereas the gain–loss filter (GLF) changes the pixels conditionally depending on whether the gain in compression outweighs the loss of information. The filtering methods alleviate the loss in compression performance caused by digitization noise while preserving the image quality measured as the optical character recognition (OCR) accuracy. The GLF reaches approximately the compression limit estimated by the compression of the noiseless digital original.