Manipulation of text documents in the modified Group 4 domain

This paper presents a novel approach to document image compression that is efficient in both compression and processing flexibility. By proper exploitation of the structural characteristics of compressed data, one may obtain high performance for image operations with low complexity. Based on CCITT Group 4, an improved coding scheme (MG4), which exploits the 2-dimensional correlation between scan lines, is developed. Then such operations as skew detection, skew correction and connected component extraction are investigated and implemented. These operations are shown to run faster in the compressed domain than traditional methods.