Hyper-Text Compression Method Based on LZW Dictionary Entry Management

LZW is a popular variant of LZ78 to compress text documents. LZW yields a high compression rate and is widely used by many commercial programs. Its core idea is to assign most probably used character group an entry in a dictionary. If a group of character which is already positioned in a dictionary appears in the streaming data, then an index of a dictionary is replaced in the position of character group. In this paper, we propose a new efficient method to find least used entries in a dictionary using counter. We also achieve higher compression rate by preassigning widely used tags in hyper-text documents. Experimental results show that the proposed method is more effective than V.42bis and Unix compression method. It gives 3∼8% better in the standard Calgary Corpus and 23∼24% better in HTML documents.