Automatic Extraction of Computer Virus SignaturesJe
暂无分享,去创建一个
One way that anti-virus programs identify the presence of a virus in an executable le, a boot record, or memory is by using short identiiers called signatures, which consist of sequences of bytes in the machine code of the virus. A good signature is one that is found in every object infected by the virus, but is unlikely to be found if the virus is not present; i.e. the likelihood of both false negatives and false positives must be minimized. Typically, a human expert chooses a signature for a new virus by means of a laborious, time-consuming procedure. Unfortunately, the accelerating innux of new computer viruses threatens to outpace the ability of human experts to analyze and nd signatures for them. To help alleviate this burden, we have developed a statistical method for automatically extracting good signatures from the machine code of a virus. The basic idea is to characterize statistically a large corpus of programs (currently about half a gigabyte), and then to use this information to estimate false-positive probabilities for proposed virus signatures. In eeect, the algorithm extrapolates from the corpus to the much larger universe of executable programs which do or might exist. In practice, signatures extracted by this method are very unlikely to generate false positives, even when the scanner that employs them permits some mismatches. This patent-pending technique has been used to either extract or evaluate the more than 2500 virus signatures used by IBM AntiVirus. It obviates the need for a small army of virus analysts, permitting IBM's signature database to be maintained by a single virus expert working halftime.
[1] Jeffrey O. Kephart,et al. Measuring and modeling computer virus prevalence , 1993, Proceedings 1993 IEEE Computer Society Symposium on Research in Security and Privacy.