Genetic Algorithms in Syllable-Based Text Compression

Syllable based text compression is a new approach to com- pression by symbols. In this concept syllables are used as the compres- sion symbols instead of the more common characters or words. This new technique has proven itself worthy especially on short to middle-length text files. The eectiveness of the compression is greatly aected by the quality of dictionaries of syllables characteristic for the certain language. These dictionaries are usually created with a straight-forward analysis of text corpora. In this paper we would like to introduce an other way of obtaining these dictionaries - using genetic algorithm. We believe, that dictionaries built this way, may help us lower the compress ratio. We will measure this eect on a set of Czech and English texts.