Scalable Function Call Graph-based Malware Classification

In an attempt to preserve the structural information in malware binaries during feature extraction, function call graph-based features have been used in various research works in malware classification. However, the approach usually employed when performing classification on these graphs, is based on computing graph similarity using computationally intensive techniques. Due to this, much of the previous work in this area incurred large performance overhead and does not scale well. In this paper, we propose a linear time function call graph (FCG) vector representation based on function clustering that has significant performance gains in addition to improved classification accuracy. We also show how this representation can enable using graph features together with other non-graph features.

[1]  Mansour Ahmadi,et al.  Microsoft Malware Classification Challenge , 2018, ArXiv.

[2]  Mark Stamp,et al.  Clustering for malware classification , 2017, Journal of Computer Virology and Hacking Techniques.

[3]  Stavros D. Nikolopoulos,et al.  A graph-based model for malware detection and classification using system-call groups , 2017, Journal of Computer Virology and Hacking Techniques.

[4]  Mansour Ahmadi,et al.  Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification , 2015, CODASPY.

[5]  S. F. P. Saramago,et al.  IMPROVED SIMULATED ANNEALING , 2015 .

[6]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[7]  Konrad Rieck,et al.  Structural detection of android malware using embedded call graphs , 2013, AISec.

[8]  Guanhua Yan,et al.  Exploring Discriminatory Features for Automated Malware Classification , 2013, DIMVA.

[9]  Kang G. Shin,et al.  MutantX-S: Scalable Malware Clustering Based on Static Features , 2013, USENIX Annual Technical Conference.

[10]  Guanhua Yan,et al.  Discriminant malware distance learning on structural information for automated malware classification , 2013, SIGMETRICS.

[11]  Jian Xu,et al.  A similarity metric method of obfuscated malware using function-call graph , 2012, Journal of Computer Virology and Hacking Techniques.

[12]  Joris Kinable,et al.  Improved call graph comparison using simulated annealing , 2011, SAC.

[13]  Joris Kinable,et al.  Malware classification based on call graph clustering , 2010, Journal in Computer Virology.

[14]  Lynn Batten,et al.  Classification of Malware Based on String and Function Feature Selection , 2010, 2010 Second Cybercrime and Trustworthy Computing Workshop.

[15]  Hisashi Kashima,et al.  A Linear-Time Graph Kernel , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[16]  Kang G. Shin,et al.  Large-scale malware indexing using function-call graphs , 2009, CCS.

[17]  Kaspar Riesen,et al.  Approximate graph edit distance computation by means of bipartite graph matching , 2009, Image Vis. Comput..

[18]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[19]  T. Dullien,et al.  Graph-based comparison of Executable Objects ( English Version ) , 2005 .

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[22]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[23]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[24]  Salvatore J. Stolfo,et al.  An extensible meta-learning approach for scalable and accurate inductive learning , 1996 .

[25]  Erkki Oja,et al.  Improved Simulated Annealing, Boltzmann Machine, and Attributed Graph Matching , 1990, EURASIP Workshop.

[26]  Barbara G. Ryder,et al.  Constructing the Call Graph of a Program , 1979, IEEE Transactions on Software Engineering.