A Compression Method for Clustered Bit-Vectors

A bit-vector can be compressed, if the frequence of zeroes (or ones as well) differs from 0.5 or if the vector is clustered in some way (i.e. not random). There are several compression methods, some of which are represented in the references [l-3]. The methods can be divided into three types: (1) Fixed-to-variable ncoding: The bit-vector is divided into futed-length sub-vectors, which are replaced with variable-length codewords. (2) Variizble-to-fuced encoding: The bit-vector is divided into sub-vectors, so-called runs, which consist of consecutive O-bits terminating with a l-bit (or vice versa). The number of the O-bits is called run-length and it is represented with a fured-length number. (3) Variizble-to-variable encoding: The run-length is encoded to a variable-length codeword. The efficiency of a compression method can be expressed with compression gain, which simply means the idO.