Feature Creation Using Genetic Algorithms for Zero False Positive Malware Classification

This paper presents a Genetic Programming approach to feature extraction in the frame of the perceptron algorithm described in [1]. While feature extraction has the potential of increasing the accuracy of classification, fully exploring the huge space of possible combinations of the initial 45150 features would make the approach infeasible; Genetic Programming provides a proper way of tackling the search for relevant features. In turn, the extracted features are used to train an algorithm - One Side Class Perceptron - designed to minimize the number of false positives; accuracy is increased. In the experiments, the classifier using the extracted features was run on a dataset consisting of 358,144 files. The results show that our overall approach and implementation is fit for real-world malware detection.

[1]  Ingo Mierswa,et al.  A Hybrid Approach to Feature Selection and Generation Using an Evolutionary Algorithm , 2003 .

[2]  Michael F. P. O'Boyle,et al.  Automatic Feature Generation for Machine Learning Based Optimizing Compilation , 2009, 2009 International Symposium on Code Generation and Optimization.

[3]  Manabu Kotani,et al.  Feature Extraction Using Genetic Algorithms , 1999 .

[4]  Asoke K. Nandi,et al.  Feature generation using genetic programming with application to fault classification , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Muhammad Zubair Shafiq,et al.  On the appropriateness of evolutionary rule learning algorithms for malware detection , 2009, GECCO '09.

[6]  Santosh K. Mishra,et al.  De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures , 2007, Bioinform..

[7]  Gary B. Lamont,et al.  A retrovirus inspired algorithm for virus detection & optimization , 2006, GECCO.

[8]  Muddassar Farooq,et al.  IMAD: in-execution malware analysis and detection , 2009, GECCO.

[9]  Nawwaf N. Kharma,et al.  Evolving novel image features using Genetic Programming-based image transforms , 2009, 2009 IEEE Congress on Evolutionary Computation.

[10]  Wouter Joosen,et al.  Evolutionary algorithms for classification of malware families through different network behaviors , 2014, GECCO.

[11]  Richard J. Enbody,et al.  Further Research on Feature Selection and Classification Using Genetic Algorithms , 1993, ICGA.

[12]  Razvan Benchea,et al.  Optimized Zero False Positives Perceptron Training for Malware Detection , 2012, 2012 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.