Scanned Compound Document Encoding Using Multiscale Recurrent Patterns

In this paper, we propose a new encoder for scanned compound documents, based upon a recently introduced coding paradigm called multidimensional multiscale parser (MMP). MMP uses approximate pattern matching, with adaptive multiscale dictionaries that contain concatenations of scaled versions of previously encoded image blocks. These features give MMP the ability to adjust to the input image's characteristics, resulting in high coding efficiencies for a wide range of image types. This versatility makes MMP a good candidate for compound digital document encoding. The proposed algorithm first classifies the image blocks as smooth (texture) and nonsmooth (text and graphics). Smooth and nonsmooth blocks are then compressed using different MMP-based encoders, adapted for encoding either type of blocks. The adaptive use of these two types of encoders resulted in performance gains over the original MMP algorithm, further increasing the performance advantage over the current state-of-the-art image encoders for scanned compound images, without compromising the performance for other image types.

[1]  Weidong Kou,et al.  Digital Image Compression , 1995 .

[2]  Martin Vetterli,et al.  Lossy compression of individual signals based on string matching and one pass codebook design , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Antonio Ortega,et al.  Rate-distortion methods for image and video compression , 1998, IEEE Signal Process. Mag..

[4]  Eduardo A. B. da Silva,et al.  Improving Multiscale Recurrent Pattern Image Coding With Deblocking Filtering , 2006, SIGMAP.

[5]  Hui Cheng,et al.  Document compression using rate-distortion optimized segmentation , 2001, J. Electronic Imaging.

[6]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[7]  Dong Liu,et al.  Block-based Fast Compression for Compound Images , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[8]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[9]  Sérgio M. M. de Faria,et al.  Compound Image Segmentation for Multiscale Recurrent Pattern Coding , 2009 .

[10]  Pamela C. Cosman,et al.  Dictionary design for text image compression with JBIG2 , 2001, IEEE Trans. Image Process..

[11]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[12]  Detlev Marpe,et al.  H.264/MPEG4-AVC fidelity range extensions: tools, profiles, performance, and application areas , 2005, IEEE International Conference on Image Processing 2005.

[13]  Alexandre Zaghetto,et al.  Iterative pre- and post-processing for MRC layers of scanned documents , 2008, 2008 15th IEEE International Conference on Image Processing.

[14]  Daniel P. Huttenlocher,et al.  Digipaper: a versatile color document image representation , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[15]  Joan L. Mitchell,et al.  JPEG: Still Image Data Compression Standard , 1992 .

[16]  Eduardo A. B. da Silva,et al.  Digital Signal Processing: System Analysis and Design , 2002 .

[17]  Eduardo A. B. da Silva,et al.  On Dictionary Adaptation for Recurrent Pattern Image Coding , 2008, IEEE Transactions on Image Processing.

[18]  Eduardo A. B. da Silva,et al.  Universal Image Compression Using Multiscale Recurrent Patterns With Adaptive Probability Model , 2008, IEEE Transactions on Image Processing.

[19]  Mikhail J. Atallah,et al.  Pattern Matching Image Compression: Algorithmic and Empirical Results , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Yann LeCun,et al.  DjVu: analyzing and compressing scanned documents for Internet distribution , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[21]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[22]  Wojciech Szpankowski,et al.  2D-pattern matching image and video compression: theory, algorithms, and experiments , 2002, IEEE Trans. Image Process..

[23]  Steven Pigeon,et al.  Lossy compression of partially masked still images , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[24]  Ricardo L. de Queiroz On Data-Filling Algorithms for MRC Layers , 2000, ICIP.

[25]  Eduardo A. B. da Silva,et al.  Multidimensional signal compression using multiscale recurrent patterns , 2002, Signal Process..

[26]  Eduardo A. B. da Silva,et al.  Universal image coding using multiscale recurrent patterns and prediction , 2005, IEEE International Conference on Image Processing 2005.

[27]  Eduardo A. B. da Silva,et al.  Multiscale recurrent pattern image coding with a flexible partition scheme , 2008, 2008 15th IEEE International Conference on Image Processing.

[28]  D. Pierre,et al.  Producing global land cover maps consistent over time to respond the needs of the climate modelling community , 2011, 2011 6th International Workshop on the Analysis of Multi-temporal Remote Sensing Images (Multi-Temp).

[29]  Charles A. Bouman,et al.  Fast search for best representations in multitree dictionaries , 2006, IEEE Transactions on Image Processing.

[30]  Amir Said,et al.  Simplified segmentation for compound image compression , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[31]  Eduardo A. B. da Silva,et al.  Efficient dictionary design for multiscale recurrent pattern image coding , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[32]  Mahmoud R. El-Sakka,et al.  Grayscale true two-dimensional dictionary-based image compression , 2007, J. Vis. Commun. Image Represent..

[33]  Kristel Michielsen,et al.  Morphological image analysis , 2000 .

[34]  Trac D. Tran,et al.  Optimizing block-thresholding segmentation for multilayer compression of compound images , 2000, IEEE Trans. Image Process..

[35]  Alexandre Zaghetto,et al.  Segmentation-Driven Compound Document Coding Based on H.264/AVC-INTRA , 2007, IEEE Transactions on Image Processing.

[36]  Gabriela Dudek,et al.  Lossy dictionary-based image compression method , 2007, Image Vis. Comput..

[37]  William A. Pearlman,et al.  A new, fast, and efficient image codec based on set partitioning in hierarchical trees , 1996, IEEE Trans. Circuits Syst. Video Technol..

[38]  Glen G. Langdon,et al.  Arithmetic Coding , 1979 .

[39]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[40]  Michelle Effros,et al.  One-pass adaptive universal vector quantization , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[41]  Weidong Kou Digital Image Compression: Algorithms and Standards , 2010 .