PageRank in malware categorization

In this paper, we propose a malware categorization method that models malware behavior in terms of instructions using PageRank. PageRank computes ranks of web pages based on structural information and can also compute ranks of instructions that represent the structural information of the instructions in malware analysis methods. Our malware categorization method uses the computed ranks as features in machine learning algorithms. In the evaluation, we compare the effectiveness of different PageRank algorithms and also investigate bagging and boosting algorithms to improve the categorization accuracy.

[1]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[2]  Vlado Keselj,et al.  Detection of New Malicious Code Using N-grams Signatures , 2004, PST.

[3]  Rubén Santamarta,et al.  GENERIC DETECTION AND CLASSIFICATION OF POLYMORPHIC MALWARE USING NEURAL PATTERN RECOGNITION , 2006 .

[4]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[5]  Eul Gyu Im,et al.  Malware categorization using dynamic mnemonic frequency analysis with redundancy filtering , 2014, Digit. Investig..

[6]  Yang Xiang,et al.  Classification of malware using structured control flow , 2010 .

[7]  Eul Gyu Im,et al.  Software plagiarism detection: a graph-based approach , 2013, CIKM.

[8]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[9]  Igor Santos,et al.  Opcode sequences as representation of executables for data-mining-based unknown malware detection , 2013, Inf. Sci..

[10]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[11]  Babak Bashari Rad,et al.  Metamorphic Virus Variants Classification Using Opcode Frequency Histogram , 2011, ArXiv.

[12]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[13]  Gran Vía,et al.  GRAPHS, ENTROPY AND GRID COMPUTING: AUTOMATIC COMPARISON OF MALWARE , 2008 .

[14]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[15]  Arun K. Pujari,et al.  N-gram analysis for computer virus detection , 2006, Journal in Computer Virology.

[16]  Neelam Duhan,et al.  Page ranking based on number of visits of links of Web page , 2011, 2011 2nd International Conference on Computer and Communication Technology (ICCCT-2011).

[17]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Marcus A. Maloof,et al.  Learning to detect malicious executables in the wild , 2004, KDD.

[20]  Debin Gao,et al.  BinHunt: Automatically Finding Semantic Differences in Binary Programs , 2008, ICICS.

[21]  Yong Chen,et al.  Automatic malware categorization using cluster ensemble , 2010, KDD.

[22]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[23]  Daniel Bilar,et al.  Opcodes as predictor for malware , 2007, Int. J. Electron. Secur. Digit. Forensics.