A survey of data mining techniques for malware detection using file features

This paper presents a survey of data mining techniques for malware detection using file features. The techniques are categorized based upon a three tier hierarchy that includes file features, analysis type and detection type. File features are the features extracted from binary programs, analysis type is either static or dynamic, and the detection type is borrowed from intrusion detection as either misuse or anomaly detection. It provides the reader with the major advancement in the malware research using data mining on file features and categorizes the surveyed work based upon the above stated hierarchy. This served as the major contribution of this paper.

[1]  Yanfang Ye,et al.  IMDS: intelligent malware detection system , 2007, KDD '07.

[2]  Peter Szor,et al.  The Art of Computer Virus Research and Defense , 2005 .

[3]  Vlado Keselj,et al.  Detection of New Malicious Code Using N-grams Signatures , 2004, PST.

[4]  Fred Cohen,et al.  Computer viruses—theory and experiments , 1990 .

[5]  Gerald Tesauro,et al.  Neural networks for computer virus recognition , 1996 .

[6]  Morgan C. Wang,et al.  Data mining methods for malware detection , 2008 .

[7]  William C. Arnold,et al.  AUTOMATICALLY GENERATED WIN32 HEURISTIC VIRUS DETECTION , 2000 .

[8]  Marcus A. Maloof,et al.  Learning to detect malicious executables in the wild , 2004, KDD.

[9]  Salvatore J. Stolfo,et al.  USENIX Association Proceedings of the FREENIX Track : 2001 USENIX Annual , 2001 .

[10]  Michael D. Smith,et al.  Host-based detection of worms through peer-to-peer cooperation , 2005, WORM '05.

[11]  Joohan Lee,et al.  Data mining methods for malware detection using instruction sequences , 2008 .

[12]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[13]  Andrew H. Sung,et al.  Static analyzer of vicious executables (SAVE) , 2004, 20th Annual Computer Security Applications Conference.

[14]  Michael Schatz,et al.  A toolkit for detecting and analyzing malicious software , 2002, 18th Annual Computer Security Applications Conference, 2002. Proceedings..

[15]  Shi-Jinn Horng,et al.  A Surveillance Spyware Detection System Based on Data Mining Methods , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[16]  Vlado Keselj,et al.  N-gram-based detection of new malicious code , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[17]  InSeon Yoo,et al.  Visualizing windows executable viruses using self-organizing maps , 2004, VizSEC/DMSEC '04.

[18]  Nathalie Japkowicz,et al.  A Feature Selection and Evaluation Scheme for Computer Virus Detection , 2006, Sixth International Conference on Data Mining (ICDM'06).

[19]  rey O. Kephart,et al.  Automatic Extraction of Computer Virus SignaturesJe , 2006 .

[20]  Ulrich Ultes-Nitsche,et al.  Non-signature based virus detection , 2006, Journal in Computer Virology.

[21]  Bhavani M. Thuraisingham,et al.  A scalable multi-level feature extraction technique to detect malicious executables , 2007, Inf. Syst. Frontiers.

[22]  Andrew Walenstein,et al.  Malware phylogeny generation using permutations of code , 2005, Journal in Computer Virology.