Malware detection using machine learning

We propose a versatile framework in which one can employ different machine learning algorithms to successfully distinguish between malware files and clean files, while aiming to minimise the number of false positives. In this paper we present the ideas behind our framework by working firstly with cascade one-sided perceptrons and secondly with cascade kernelized one-sided perceptrons. After having been successfully tested on medium-size datasets of malware and clean files, the ideas behind this framework were submitted to a scaling-up process that enable us to work with very large datasets of malware and clean files.

[1]  Andrew Walenstein,et al.  Using Markov chains to filter machine-morphed variants of malicious programs , 2008, 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE).

[2]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[3]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[4]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[5]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[6]  Yuval Elovici,et al.  Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey , 2009, Inf. Secur. Tech. Rep..

[7]  Philip K. Chan,et al.  Machine Learning for Computer Security , 2006, J. Mach. Learn. Res..

[8]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[9]  Yoseba K. Penya,et al.  N-grams-based File Signatures for Malware Detection , 2009, ICEIS.

[10]  Carsten Willems,et al.  Learning and Classification of Malware Behavior , 2008, DIMVA.

[11]  Santosh K. Mishra,et al.  De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures , 2007, Bioinform..

[12]  Mark Stamp,et al.  Profile hidden Markov models and metamorphic virus detection , 2009, Journal in Computer Virology.

[13]  Evgenios Konstantinou,et al.  Metamorphic Virus: Analysis and Detection , 2008 .

[14]  Yanfang Ye,et al.  IMDS: intelligent malware detection system , 2007, KDD '07.

[15]  Rubén Santamarta,et al.  GENERIC DETECTION AND CLASSIFICATION OF POLYMORPHIC MALWARE USING NEURAL PATTERN RECOGNITION , 2006 .

[16]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[17]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[18]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[19]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[20]  InSeon Yoo,et al.  Visualizing windows executable viruses using self-organizing maps , 2004, VizSEC/DMSEC '04.

[21]  Shambhu J. Upadhyaya,et al.  SpyCon: Emulating User Activities to Detect Evasive Spyware , 2007, 2007 IEEE International Performance, Computing, and Communications Conference.

[22]  Philip K. Chan,et al.  Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security , 2004, CCS 2004.