Semi-Lossless Text Compression: a Case Study
暂无分享,去创建一个
Text compression is generally considered only as lossless compression. Kaufman and Klein in [1] introduce the idea of semi-lossless text compression: the decompressed text will not be identical to the original text, but, just as for a decompressed JPEG image of good quality that is not identical to the original but can be used in the place of the original in many applications, our brain will adjust the data to make it usable and understandable. In this paper we experiment with semi-lossless compression on a case study of small text files in Italian language.
[1] Ian H. Witten,et al. Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .
[2] Shmuel Tomi Klein,et al. Semi-Lossless Text Compression , 2004, Stringology.