A universal compression strategy using sorting transformation

Although traditional universal compression algorithms can effectively utilise repetition located in a slide window, they cannot take their own advantages for some message source in which similar messages are distributed uniformly. In this paper, we come up with a universal segmenting-sorting compression algorithm to solve this problem. The key idea is to reorder the message source before compressing it with Lz77 algorithm. We design transformation methods for two common data types, corpus of webpages and access log. The experimental results show that segmenting-sorting transformation is truly beneficial to compression ratio. Our new algorithm is able to make compression ratio 20% to 50% lower than naive Lz77 algorithm does and takes almost the same decompression time. For some read-heavy source segmenting-sorting compression can reduce space cost while guaranteeing throughput.