Textual image compression: two-stage lossy/lossless encoding of textual images

A two-stage method for compressing bilevel images is described that is particularly effective for images containing repeated subimages, notably text. In the first stage, connected groups of pixels, corresponding approximately to individual characters, are extracted from the image. These are matched against an adaptively constructed library of patterns seen so far, and the resulting sequence of symbol identification numbers is coded and transmitted. From this information, along with the library itself and the offset from one mark to the next, an approximate image can be reconstructed. The result is a lossy method of compression that outperforms other schemes. The second stage employs the reconstructed image as an aid for encoding the original image using a statistical context-based compression technique. This yields a total bandwidth for exact transmission appreciably undercutting that required by other lossless binary image compression methods. Taken together, the lossy, and lossless methods provide an effective two-stage progressive transmission capability for textual images which has application for legal, medical, and historical purposes, and to archiving in general. >

[1]  Kazuhiko Yamamoto,et al.  Structured Document Image Analysis , 1992, Springer Berlin Heidelberg.

[2]  K. Kobayashi,et al.  Advances in FAX , 1985, Proceedings of the IEEE.

[3]  George Nagy,et al.  A Means for Achieving a High Degree of Compaction on Scan-Digitized Printed Text , 1974, IEEE Transactions on Computers.

[4]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[5]  Costas Xydeas,et al.  Recent developments in image data compression for digital facsimile , 1986 .

[6]  R. Hunter,et al.  International digital facsimile coding standards , 1980, Proceedings of the IEEE.

[7]  A. Moffat Two-level context based compression of binary images , 1991, [1991] Proceedings. Data Compression Conference.

[8]  Mahesh Viswanathan,et al.  A prototype document image analysis system for technical journals , 1992, Computer.

[9]  D. Bodson,et al.  Measurement of data compression in advanced group 4 facsimile systems , 1985, Proceedings of the IEEE.

[10]  W.K. Pratt,et al.  Combined symbol matching facsimile data compression system , 1980, Proceedings of the IEEE.

[11]  Walter S. Rosenbaum,et al.  Word Autocorrelation Redundancy Match (WARM) Technology , 1982, IBM J. Res. Dev..

[12]  Murray J. J. Holt,et al.  A Fast Binary Template Matching Algorithm for Document Image Data Cmpression , 1988, Pattern Recognition.

[13]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[14]  O. Johnsen,et al.  Coding of two-level pictures by pattern matching and substitution , 1983, The Bell System Technical Journal.

[15]  Jorma Rissanen,et al.  Compression of Black-White Images with Arithmetic Coding , 1981, IEEE Trans. Commun..