Segmentation of compressed documents

We present a novel technique for segmentation of a JPEG-compressed document based on block activity. The activity is measured as the number of bits spent to encode each block. Each number is mapped to a pixel brightness value in an auxiliary image which is then used for segmentation. We introduce the use of such an image and show an example of a simple segmentation algorithm, which was successfully applied to test documents. The desired region can be identified and cropped (or replaced) from the compressed data without decompressing the image.

[1]  Dennis F. Dunn,et al.  Extracting color halftones from printed documents using texture analysis , 1996, Proceedings of International Conference on Image Processing.

[2]  William E. Higgins,et al.  Extracting halftones from printed documents using texture analysis , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[3]  Reiner Eschbach,et al.  Fast Segmentation of JPEG-Compressed Documents , 1999 .

[4]  Joan L. Mitchell,et al.  JPEG: Still Image Data Compression Standard , 1992 .

[5]  Jiangying Zhou,et al.  Page segmentation and classification , 1992, CVGIP Graph. Model. Image Process..

[6]  Kevin J. Parker,et al.  Segmentation of scanned document images for efficient compression , 1996, Other Conferences.

[7]  L. Vincent Morphological Algorithms , 2018, Mathematical Morphology in Image Processing.

[8]  Edward R. Dougherty,et al.  An introduction to morphological image processing , 1992 .

[9]  Gregory K. Wallace,et al.  The JPEG Still Image Compression Standard , 1991 .

[10]  Ricardo L. de Queiroz,et al.  Processing JPEG-compressed images and documents , 1998, IEEE Trans. Image Process..

[11]  Sargur N. Srihari Document Image Understanding , 1986, FJCC.

[12]  Hei Tao Fung,et al.  Segmentation of scanned documents for efficient compression , 1996 .