MalCommunity: A Graph-Based Evaluation Model for Malware Family Clustering

Malware clustering analysis plays an important role in large-scale malware homology analysis. However, the generation approach of the ground truth data is usually ignored. The Labels from Anti-virus(AV) engines are most commonly used but some of them are inaccurate or inconsistent. To overcome the drawback, many researchers make ground truth data based on voting mechanism such as AVclass, but this method is difficult to evaluate different-granularity clustering results. Graph-based method like VAMO is more robust but it needs to maintain a large-size database. In this paper, we propose a novel evaluation model named MalCommunity based on the graph named Malware Relation Graph. Different from VAMO, the construction of the graph is free from a large-size database and just needs the AV label information of the samples in the test set. We introduce community detection algorithm Fast Newman to divide the sample set and use modularity parameter to measure the target clustering results. The experiment results indicate that our model has the ability of noise immunity of malware family classification inconsistency and granularity inconsistency from AV labels. Our model is also convenient to evaluate different-granularity clustering methods with different heights.

[1]  Utkarsh Upadhyay,et al.  A Broad View of the Ecosystem of Socially Engineered Exploit Documents , 2017, NDSS.

[2]  Christopher Krügel,et al.  Nazca: Detecting Malware Distribution in Large-Scale Networks , 2014, NDSS.

[3]  Niels Provos,et al.  CAMP: Content-Agnostic Malware Protection , 2013, NDSS.

[4]  Kang G. Shin,et al.  DUET: integration of dynamic and static analyses for malware clustering with cluster ensembles , 2013, ACSAC.

[5]  Claudia Eckert,et al.  Counteracting Data-Only Malware with Code Pointer Examination , 2015, RAID.

[6]  Sankardas Roy,et al.  Deep Ground Truth Analysis of Current Android Malware , 2017, DIMVA.

[7]  Duen Horng Chau,et al.  Guilt by association: large scale malware detection by mining file-relation graphs , 2014, KDD.

[8]  Fabian Monrose,et al.  Cache, Trigger, Impersonate: Enabling Context-Sensitive Honeyclient Analysis On-the-Wire , 2016, NDSS.

[9]  Peng Li,et al.  On Challenges in Evaluating Malware Clustering , 2010, RAID.

[10]  Khaled Yakdan,et al.  Helping Johnny to Analyze Malware: A Usability-Optimized Decompiler and Malware Analysis User Study , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[11]  Tyler Moore,et al.  Polymorphic Malware Detection Using Sequence Classification Methods , 2016, 2016 IEEE Security and Privacy Workshops (SPW).

[12]  Kevin Leach,et al.  LO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis , 2016, NDSS.

[13]  Roberto Perdisci,et al.  VAMO: towards a fully automated malware clustering validity analysis , 2012, ACSAC '12.

[14]  Konstantin Berlin,et al.  Deep neural network based malware detection using two dimensional binary program features , 2015, 2015 10th International Conference on Malicious and Unwanted Software (MALWARE).

[15]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[16]  Johannes Bader,et al.  A Comprehensive Measurement Study of Domain Generating Malware , 2016, USENIX Security Symposium.

[17]  Guofei Gu,et al.  GoldenEye: Efficiently and Effectively Unveiling Malware's Targeted Environment , 2014, RAID.

[18]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[19]  Giovanni Vigna,et al.  MalGene: Automatic Extraction of Malware Analysis Evasion Signature , 2015, CCS.

[20]  Fang Yu,et al.  Finding the Linchpins of the Dark Web: a Study on Topologically Dedicated Hosts on Malicious Web Infrastructures , 2013, 2013 IEEE Symposium on Security and Privacy.

[21]  Juan Caballero,et al.  AVclass: A Tool for Massive Malware Labeling , 2016, RAID.

[22]  B. S. Manjunath,et al.  SigMal: a static signal processing based malware triage , 2013, ACSAC.

[23]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Leyla Bilge,et al.  The Dropper Effect: Insights into Malware Distribution with Downloader Graph Analytics , 2015, CCS.