A compression scheme allowing direct string matching on compressed binary files and its applications

In this paper, we present a compression scheme that allows direct string matching on compressed files. The scheme can compress general files not limited to ASCII texts. We apply this compression scheme to several search programs including grep and ClamAV, which is a widely used anti-virus system. By compressing the files and the patterns with the same compression scheme, the programs can scan the compressed files directly for compressed patterns. Since the file is compressed, the searching time on compressed files is decreased comparing to the uncompressed case. We conducted several tests on binary of files. For binary executable files, we achieve about 15% space reduction and 15% running time reduction.