Compression of small text files using syllables
暂无分享,去创建一个
Summary form only given. We adapted well-known algorithms of adaptive Huffman coding and LZW to use syllables and words instead of characters for text compression. We tested the algorithms on collections of small or middle-sized files. Using syllable-based compression algorithms on English documents gives expected results: they outperform character-based and are outperformed by word-based versions of the same algorithm. According our tests both syllable- and word-based compression methods are sensitive to initial setting of their dictionaries. The decomposition of words into syllables is not trivial and is language dependent. An open issue is the applicability of syllable-based compression for different languages (like German, Rusian, or Hungarian) and its use in conjunction with other algorithms like block-sorting lossless compression
[1] Ian H. Witten,et al. Arithmetic coding revisited , 1998, TOIS.
[2] Michal Zemlicka,et al. Text Compression: Syllables , 2005, DATESO.