Model-Based Iterative Restoration for Binary Document Image Compression with Dictionary Learning

The inherent noise in the observed (e.g., scanned) binary document image degrades the image quality and harms the compression ratio through breaking the pattern repentance and adding entropy to the document images. In this paper, we design a cost function in Bayesian framework with dictionary learning. Minimizing our cost function produces a restored image which has better quality than that of the observed noisy image, and a dictionary for representing and encoding the image. After the restoration, we use this dictionary (from the same cost function) to encode the restored image following the symbol-dictionary framework by JBIG2 standard with the lossless mode. Experimental results with a variety of document images demonstrate that our method improves the image quality compared with the observed image, and simultaneously improves the compression ratio. For the test images with synthetic noise, our method reduces the number of flipped pixels by 48.2% and improves the compression ratio by 36.36% as compared with the best encoding methods. For the test images with real noise, our method visually improves the image quality, and outperforms the cutting-edge method by 28.27% in terms of the compression ratio.

[1]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[2]  Ronald Arps,et al.  Fast residue coding for lossless textual image compression , 1997, Proceedings DCC '97. Data Compression Conference.

[3]  Ronald Arps,et al.  JBIG2-the ultimate bi-level image coding standard , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[4]  Charles A. Bouman,et al.  Implicit Gibbs prior models for tomographic reconstruction , 2012, 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[5]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[6]  W.K. Pratt,et al.  Combined symbol matching facsimile data compression system , 1980, Proceedings of the IEEE.

[7]  Charles A. Bouman,et al.  Implicit priors for model-based inversion , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Pamela C. Cosman,et al.  Fast and memory efficient text image compression with JBIG2 , 2003, IEEE Trans. Image Process..

[9]  Robert D. Nowak,et al.  Majorization–Minimization Algorithms for Wavelet-Based Image Restoration , 2007, IEEE Transactions on Image Processing.

[10]  R. Horgan,et al.  Statistical Field Theory , 2014 .

[11]  Peyman Milanfar,et al.  A Tour of Modern Image Filtering: New Insights and Methods, Both Practical and Theoretical , 2013, IEEE Signal Processing Magazine.

[12]  Jan P. Allebach,et al.  Binary image compression using conditional entropy-based dictionary design and indexing , 2013, Electronic Imaging.

[13]  Stéphane Mallat,et al.  Solving Inverse Problems With Piecewise Linear Estimators: From Gaussian Mixture Models to Structured Sparsity , 2010, IEEE Transactions on Image Processing.

[14]  Jan P. Allebach,et al.  Dynamic hierarchical dictionary design for multi-page binary document image compression , 2013, 2013 IEEE International Conference on Image Processing.

[15]  Pamela C. Cosman,et al.  Symbol dictionary design for the JBIG2 standard , 2000, Proceedings DCC 2000. Data Compression Conference.

[16]  P.G. Howard Lossless and lossy compression of text images by soft pattern matching , 1996, Proceedings of Data Compression Conference - DCC '96.

[17]  Charles A. Bouman,et al.  A new approach to JBIG2 binary image compression , 2007, Electronic Imaging.

[18]  T. K. Truong,et al.  Comparison of international standards for lossless still image compression , 1994, Proc. IEEE.

[19]  Guobao Wang,et al.  Penalized Likelihood PET Image Reconstruction Using Patch-Based Edge-Preserving Regularization , 2012, IEEE Transactions on Medical Imaging.

[20]  Neal E. Young,et al.  A codebook generation algorithm for document image compression , 1997, Proceedings DCC '97. Data Compression Conference.

[21]  John M. Danskin,et al.  Entropy-based pattern matching for document image compression , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[22]  Faouzi Kossentini,et al.  The emerging JBIG2 standard , 1998, IEEE Trans. Circuits Syst. Video Technol..

[23]  Karen O. Egiazarian,et al.  Image restoration by sparse 3D transform-domain collaborative filtering , 2008, Electronic Imaging.

[24]  Michael Elad,et al.  Image Denoising Via Learned Dictionaries and Sparse representation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  Yuan Qi,et al.  Message passing with l1 penalized KL minimization , 2013, ICML.

[26]  Ken D. Sauer,et al.  Gaussian mixture Markov random field for image denoising and reconstruction , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[27]  Guillermo Sapiro,et al.  Non-local sparse models for image restoration , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  R. Hunter,et al.  International digital facsimile coding standards , 1980, Proceedings of the IEEE.

[29]  Jan P. Allebach,et al.  Binary text image file preprocessing to account for printer dot gain , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[30]  Ken D. Sauer,et al.  A unified approach to statistical tomography using coordinate descent optimization , 1996, IEEE Trans. Image Process..

[31]  Maribel Figuera Alegre,et al.  Memory-efficient algorithms for raster document image compression , 2008 .

[32]  Michael Elad,et al.  Sparse Representation for Color Image Restoration , 2008, IEEE Transactions on Image Processing.

[33]  Yair Weiss,et al.  From learning models of natural image patches to whole image restoration , 2011, 2011 International Conference on Computer Vision.

[34]  Jean-Michel Morel,et al.  A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Murray J. J. Holt,et al.  A Fast Binary Template Matching Algorithm for Document Image Data Cmpression , 1988, Pattern Recognition.

[36]  Lei Zhang,et al.  Low-Dose X-ray CT Reconstruction via Dictionary Learning , 2012, IEEE Transactions on Medical Imaging.

[37]  Pamela C. Cosman,et al.  Dictionary design for text image compression with JBIG2 , 2001, IEEE Trans. Image Process..