Compressing Compound Documents

In this chapter we introduce the basic methods for the compression of documents in raster format that may contain a mixture of text, graphics and pictures. These documents with mixed contents are called compound documents. We present in detail the multilayer approach to decompose a compound image into homogeneous individual planes, i.e. into images with only text, or pictures, or graphics, etc. We focus on the mixed raster content (MRC) standard as the main compound image representation. The MRC framework facilitates compression and is also part of the JPEG 2000 Part 6 standard (JPM). We describe common imaging models and explain segmentation strategies and goals. A block-based optimized segmentation algorithm is presented, along with its fast approximation. We also describe efficient plane filling algorithms, which are necessary parts of an MRC representation. JPEG 2000’s JPM profile is discussed as a framework for general multipage multilayer document compression. Results are shown along with decompressed images to illustrate the performance of the compound document compression algorithms discussed in this chapter.

[1]  Glenn C. Reid Postscript language - program design , 1988 .

[2]  Robert R. Buckley,et al.  A JPEG 2000 compound image file reader/writer and interactive viewer , 2003, SPIE Optics + Photonics.

[3]  Michael W. Marcellin,et al.  JPEG2000 - image compression fundamentals, standards and practice , 2002, The Kluwer International Series in Engineering and Computer Science.

[4]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[5]  R.L. de Queiroz On data filling algorithms for MRC layers , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[6]  Debargha Mukherjee,et al.  JPEG2000-matched MRC compression of compound documents , 2002, Proceedings. International Conference on Image Processing.

[7]  Trac D. Tran,et al.  Optimizing block-thresholding segmentation for multilayer compression of compound images , 2000, IEEE Trans. Image Process..

[8]  Joan L. Mitchell,et al.  JPEG: Still Image Data Compression Standard , 1992 .

[9]  Yoshua Bengio,et al.  High quality document image compression with "DjVu" , 1998, J. Electronic Imaging.

[10]  Ping-Sing Tsai,et al.  JPEG: Still Image Compression Standard , 2005 .

[11]  Pascal Vincent,et al.  Color documents on the Web with DjVu , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[12]  Edward K. Wong,et al.  Check image compression using a layered coding method , 1998, J. Electronic Imaging.

[13]  Nasir D. Memon,et al.  JPEG-matched MRC compression of compound documents , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[14]  Lloyd McIntyre,et al.  New Developments in Color Facsimile and Internet Fax , 1997, Color Imaging Conference.

[15]  Daniel P. Huttenlocher,et al.  Digipaper: a versatile color document image representation , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[16]  Ruedi Seiler,et al.  Segmentation and compression of documents with JPEG2000 , 2003, IEEE Trans. Consumer Electron..

[17]  Ming Xu,et al.  Mixed raster content (MRC) model for compound image compression , 1998, Electronic Imaging.

[18]  Kevin J. Parker,et al.  Segmentation of scanned document images for efficient compression , 1996, Other Conferences.