High Efficient Compression Strategy for Scanned Receipts and Handwritten Documents

Image compression is one of the traditional topics in image processing and has been widely discussed and applied. Some standards, such as, JPEG and JPEG 2000, have also been published for the applications dealing with gray or color photos and medical images. However, for some specific applications, such as, electronic financial management systems (eFMS), much higher efficient algorithms have to be designed for the compression of receipts or handwritten documents. A new strategy is discussed for the compression based on the separation of foreground and background according to the assumption that less degradation of foreground is allowed because of the most important information represented, while more degradation of background is acceptable because it only provides the sense of reality of the document. The image is firstly transformed to YCbCr color space to separate intensities from tones. Then, foreground and background are extracted from the intensity subimage with median filter. Both foreground and background are down-sampled and respectively clustered based on the gray histograms. The chromatic aberration subimages are also down-sampled and transformed to palette-index model by the clustering based on the 2D histogram. All clustered subimages are encoded with JPEG introduced RLE algorithm and synthesized finally. The results demonstrated much higher compression rates of presented strategy than that of JPEG standard.

[1]  Pamela C. Cosman,et al.  Dictionary design for text image compression with JBIG2 , 2001, IEEE Trans. Image Process..

[2]  Pamela C. Cosman,et al.  Fast and memory efficient text image compression with JBIG2 , 2003, IEEE Trans. Image Process..

[3]  Paul G. Howard,et al.  Text Image Compression Using Soft Pattern Matching , 1997, Comput. J..

[4]  Nicole Vincent,et al.  Writers authentication and fractal compression , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[5]  P.G. Howard Lossless and lossy compression of text images by soft pattern matching , 1996, Proceedings of Data Compression Conference - DCC '96.