A Gene-Inspired Malware Detection Approach

Malware detection is an important topic in cyber security. The research presented in this paper mainly studies on disassembly codes of Windows executable files, learns from the research route of bioinformatics and proposes the concept of software gene. A distance-based method is also proposed to measure the difference of genes and the dimensionality reduction based on a modified clustering algorithm of biological phylogenetic model. Finally a gene-inspired malware detector is constructed using Random Forest model. The software gene extraction proposed in this paper is more flexible and generates less data than the widely-used n-gram method. The detector based on genes also performs better. The clustering-based dimensionality reduction retains more comprehensive features and maintains the interpretability in software analysis area. The detector based on gene-inspired malware detection approach can reach the precision 96.14%, which is better than traditional methods.