Wavelet domain textual coding of Ottoman script images

Image coding using wavelet transform, DCT, and similar transform techniques is well established. On the other hand, these coding methods neither take into account the special characteristics of the images in a database nor are they suitable for fast database search. In this paper, the digital archiving of Ottoman printings is considered. Ottoman documents are printed in Arabic letters. Witten et al. describes a scheme based on finding the characters in binary document images and encoding the positions of the repeated characters This method efficiently compresses document images and is suitable for database research, but it cannot be applied to Ottoman or Arabic documents as the concept of character is different in Ottoman or Arabic. Typically, one has to deal with compound structures consisting of a group of letters. Therefore, the matching criterion will be according to those compound structures. Furthermore, the text images are gray tone or color images for Ottoman scripts for the reasons that are described in the paper. In our method the compound structure matching is carried out in wavelet domain which reduces the search space and increases the compression ratio. In addition to the wavelet transformation which corresponds to the linear subband decomposition, we also used nonlinear subband decomposition. The filters in the nonlinear subband decomposition have the property of preserving edges in the low resolution subband image.

[1]  Murray J. J. Holt,et al.  A Fast Binary Template Matching Algorithm for Document Image Data Cmpression , 1988, Pattern Recognition.

[2]  Michel Barlaud,et al.  Recursive biorthogonal wavelet transform for image coding , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[3]  M. Kunt,et al.  High compression image coding using an adaptive morphological subband decomposition , 1995, Proc. IEEE.

[4]  O. Johnsen,et al.  Coding of two-level pictures by pattern matching and substitution , 1983, The Bell System Technical Journal.

[5]  George Nagy,et al.  A Means for Achieving a High Degree of Compaction on Scan-Digitized Printed Text , 1974, IEEE Transactions on Computers.

[6]  Michel Barlaud,et al.  Image coding using wavelet transform , 1992, IEEE Trans. Image Process..

[7]  Edward H. Adelson,et al.  Orthogonal Pyramid Transforms For Image Coding. , 1987, Other Conferences.

[8]  A. Moffat Two-level context based compression of binary images , 1991, [1991] Proceedings. Data Compression Conference.

[9]  I. Daubechies Orthonormal bases of compactly supported wavelets , 1988 .

[10]  Jelena Kovacevic,et al.  Perfect Reconstruction Filter Banks for Hdtv Representation and Coding* , 1989 .

[11]  Ian H. Witten,et al.  Textual image compression: two-stage lossy/lossless encoding of textual images , 1994, Proc. IEEE.

[12]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  W.K. Pratt,et al.  Combined symbol matching facsimile data compression system , 1980, Proceedings of the IEEE.