Recent literature has proposed approaches to detect code-sharing relationships between malware artifacts, which helps to accelerate the malware reverse engineering process. In this paper we propose a novel code-sharing analysis technique that can complement existing methods. Our algorithm partitions malware system call logs into system call subsequences by identifying places in these logs where the set of saved instruction pointers on the program call stack changes significantly. The extracted subsequences thus reflect subsequences of system calls that occur in local regions of the program call graph. Having extracted subsequences, we then use the subsequences as features for computing a malware sample similarity matrix. A unique contribution of our method is that it incorporates sequence information into the features it uses to perform similarity analysis, but unlike previously proposed longest common substring methods it runs in linear time. Similarly, our method incorporates call stack information into its features but is computationally far more tractable than previously proposed call graph isomorphism techniques. Because we extract information from sample behavior logs, we avoid the problem of obfuscated samples resistant to static analysis tools. We have evaluated our method on a corpus of 959 samples and achieve high precision given known malware family labels.
[2]
Carsten Willems,et al.
Automatic analysis of malware behavior using machine learning
,
2011,
J. Comput. Secur..
[3]
Olatz Arbelaitz,et al.
Evaluation of Malware clustering based on its dynamic behaviour
,
2008,
AusDM.
[4]
P. Jaccard,et al.
Etude comparative de la distribution florale dans une portion des Alpes et des Jura
,
1901
.
[5]
Radu State,et al.
Malware behaviour analysis
,
2008,
Journal in Computer Virology.
[6]
Joris Kinable,et al.
Malware Detection Through Call Graphs
,
2010
.
[7]
Joshua Saxe,et al.
Visualization of shared system call sequence relationships in large malware corpora
,
2012,
VizSec '12.
[8]
Daniel J. Quinlan,et al.
Detecting code clones in binary executables
,
2009,
ISSTA.
[9]
Kang G. Shin,et al.
Large-scale malware indexing using function-call graphs
,
2009,
CCS.
[10]
Zhuoqing Morley Mao,et al.
Automated Classification and Analysis of Internet Malware
,
2007,
RAID.
[11]
Muddassar Farooq,et al.
IMAD: in-execution malware analysis and detection
,
2009,
GECCO.