Data compression using word encoding with Huffman code

A technique for compressing large databases is presented. The method replaces frequent variable-length byte strings (words or word fragments) in the database by minimum-redundancy codes—Huffman codes. An essential part of the technique is the construction of the dictionary to yield high compression ratios. A heuristic is used to count frequencies of word fragments. A detailed analysis is provided of our implementaton in support of high compression ratios and efficient encoding and decoding under the constraint of a fixed amount of main memory. In each phase of our implementation, we explain why certain data structures or techniques are employed. Experimental results show that our compression scheme is very effective for compressing large databases of library records. © 1991 John Wiley & Sons, Inc.