A guaranteed compression scheme for repetitive DNA sequences

We present a text compression scheme dedicated to DNA sequences. The exponential growing of the number of sequences creates a real need for analyzing tools. A specific need emerges for methods that perform sequences classification upon various criteria, one of which is the sequence repetitiveness. A good lossless compression scheme is able to distinguish between "random" and "significative" repeats. Theoretical bases for this statement are found in Kolmogorov complexity theory.