Semi-Supervised Malware Clustering Based on the Weight of Bytecode and API

With the rapid advances of anti-virus and anti-tracking technologies, three aspects in malware clustering need to be improved for effective clustering, i.e., the robustness of features, the accuracy of similarity measurements, and the effectiveness of clustering algorithms. In this paper, we propose a novel malware family clustering approach based on dynamic and static features with their weights. In this approach, we employ a new similarity measurement method based on EMD to improve the accuracy of feature similarities. In addition, to reduce convergence time and improve clustering purity, we design a novel semi-supervised clustering algorithm, termed as S-DBSCAN by involving supervision information into the original algorithm known as Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The experimental results demonstrate that the proposed approach can correctly and accurately distinguish the samples among various families and achieve outperformed purity with 98.7%.

[1]  Jianguo Jiang,et al.  Based on Multi-features and Clustering Ensemble Method for Automatic Malware Categorization , 2017, 2017 IEEE Trustcom/BigDataSE/ICESS.

[2]  Hui Yin,et al.  Malware Variants Detection Using Density Based Spatial Clustering with Global Opcode Matrix , 2017, SpaCCS Workshops.

[3]  Sanjay Kumar Sahay,et al.  Grouping the executables to detect malware with high accuracy , 2016, ArXiv.

[4]  Hamid Parvin,et al.  A New N-gram Feature Extraction-Selection Method for Malicious Code , 2011, ICANNGA.

[5]  Cheng Wang,et al.  A malware variants detection methodology with an opcode based feature method and a fast density based clustering algorithm , 2016, 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD).

[6]  Wenguan Wang,et al.  Deep Visual Attention Prediction , 2017, IEEE Transactions on Image Processing.

[7]  Jianbing Shen,et al.  Real-Time Superpixel Segmentation by DBSCAN Clustering Algorithm. , 2016, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[8]  Srinivas Mukkamala,et al.  Malware detection using assembly and API call sequences , 2011, Journal in Computer Virology.

[9]  Yong Tang,et al.  Malware Clustering Based on SNN Density Using System Calls , 2015, IEEE CLOUD 2015.

[10]  Jelena Mirkovic,et al.  Malware Analysis Through High-level Behavior , 2018, CSET @ USENIX Security Symposium.

[11]  Nirmal Singh,et al.  ByteFreq: Malware clustering using byte frequency , 2016, 2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO).

[12]  Kai Zhang,et al.  Collaborative Support Vector Machine for Malware Detection , 2017, ICCS.

[13]  Xiaogang Jin,et al.  Quadruplet Network With One-Shot Learning for Fast Visual Object Tracking , 2017, IEEE Transactions on Image Processing.

[14]  Jianbing Shen,et al.  Triplet Loss in Siamese Network for Object Tracking , 2018, ECCV.

[15]  Johari Abdullah,et al.  Hierarchical Density-based Clustering of Malware Behaviour , 2017 .

[16]  B. S. Manjunath,et al.  SATTVA: SpArsiTy inspired classificaTion of malware VAriants , 2015, IH&MMSec.

[17]  Roberto Perdisci,et al.  Scalable fine-grained behavioral clustering of HTTP-based malware , 2013, Comput. Networks.

[18]  Yeali S. Sun,et al.  Relationship of Jaccard and edit distance in malware clustering and online identification (Extended abstract) , 2017, 2017 IEEE 16th International Symposium on Network Computing and Applications (NCA).

[19]  Laith Mohammad Abualigah,et al.  Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering , 2017, The Journal of Supercomputing.

[20]  Igor Santos,et al.  Semi-supervised Learning for Unknown Malware Detection , 2011, DCAI.

[21]  Hemant Rathore,et al.  Malware Detection Using Machine Learning and Deep Learning , 2018, BDA.

[22]  Mingwei Zhang,et al.  Semi-supervised classification for dynamic Android malware detection , 2017, ArXiv.

[23]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[24]  Jianhong Wang,et al.  Malware Clustering Using Family Dependency Graph , 2019, IEEE Access.

[25]  Juan Caballero,et al.  AVclass: A Tool for Massive Malware Labeling , 2016, RAID.

[26]  Mansour Ahmadi,et al.  Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification , 2015, CODASPY.

[27]  David Slater,et al.  Malicious Behavior Detection using Windows Audit Logs , 2015, AISec@CCS.

[28]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[29]  Sayak Ray,et al.  Malware detection using machine learning based analysis of virtual memory access patterns , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[30]  Wei Lee Woon,et al.  Discovering Similarities in Malware Behaviors by Clustering of API Call Sequences , 2018, ICONIP.

[31]  Gilbert Ritschard,et al.  Analyzing and Visualizing State Sequences in R with TraMineR , 2011 .

[32]  Edward Raff,et al.  Learning the PE Header, Malware Detection with Minimal Domain Knowledge , 2017, AISec@CCS.

[33]  Kehinde O. Babaagba,et al.  A Study on the Effect of Feature Selection on Malware Analysis using Machine Learning , 2019, Proceedings of the 2019 8th International Conference on Educational and Information Technology.

[34]  Kaspersky Enterprise Cybersecurity Machine Learning for Malware Detection , 2017 .

[35]  Hsinchun Chen,et al.  AZSecure Hacker Assets Portal: Cyber threat intelligence and malware analysis , 2016, 2016 IEEE Conference on Intelligence and Security Informatics (ISI).

[36]  Roberto Baldoni,et al.  Malware family identification with BIRCH clustering , 2017, 2017 International Carnahan Conference on Security Technology (ICCST).

[37]  Sattar Hashemi,et al.  Malware detection based on mining API calls , 2010, SAC '10.

[38]  Rima Asmar Awad,et al.  Automatic clustering of malware variants , 2016, 2016 IEEE Conference on Intelligence and Security Informatics (ISI).

[39]  Jon Barker,et al.  Malware Detection by Eating a Whole EXE , 2017, AAAI Workshops.

[40]  Divya Bansal,et al.  Clustering Morphed Malware using Opcode Sequence Pattern Matching , 2018 .

[41]  Michael Howard,et al.  Hierarchical management of large-scale malware data , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[42]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[43]  Laith Mohammad Abualigah,et al.  A new feature selection method to improve the document clustering using particle swarm optimization algorithm , 2017, J. Comput. Sci..

[44]  Adrian Colesa,et al.  Malware Clustering Based on Called API During Runtime , 2018, IOSec@RAID.

[45]  Orestis Kostakis,et al.  Classy: fast clustering streams of call-graphs , 2014, Data Mining and Knowledge Discovery.

[46]  Haibin Ling,et al.  A Deep Network Solution for Attention and Aesthetics Aware Photo Cropping , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Wei Zhang,et al.  Semantics-Based Online Malware Detection: Towards Efficient Real-Time Protection Against Malware , 2016, IEEE Transactions on Information Forensics and Security.

[48]  Kang G. Shin,et al.  MutantX-S: Scalable Malware Clustering Based on Static Features , 2013, USENIX Annual Technical Conference.