A Framework for Analyzing Ransomware using Machine Learning

Ransomware attacks increased in recent years causing significant damages and disruptions to businesses. Forensic analysis such as reverse engineering of executables (or binary files) is the common practice of examining such malware characteristics. In this work, we developed a reverse engineering framework incorporating feature generation engines and machine learning (ML) to efficiently detect ransomware. This framework is used to perform multi-level analysis (such as raw binaries, assembly codes, libraries, and function calls) in order to better examine and interpret the purpose of malware code segments. We leverage the object-code dump tool (Linux) and portable executable (PE) parser to decode binaries to assembly level instructions and dynamic link libraries (DLLs). Both ransomware and normal binaries are considered to conduct experiments where samples are first pre-processed to extract features and then different (supervised) ML techniques are applied to classify these samples. Experimental results reported the performance i.e., the detection accuracy of ransomware samples which varied from 76% to 97% based on the ML technique used. In particular, among the eight ML classifiers tested, seven of these performed well with detection rate of at least 90%. This study also demonstrated that the combination of static level analysis at the ASM-level and DLL-level can better distinguish ransomware from normal binaries.

[1]  Damon McCoy,et al.  Tracking Ransomware End-to-end , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[2]  Daniele Sgandurra,et al.  Automated Dynamic Analysis of Ransomware: Benefits, Limitations and use for Detection , 2016, ArXiv.

[3]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[4]  Leo Lebanov,et al.  Random Forests machine learning applied to gas chromatography - Mass spectrometry derived average mass spectrum data sets for classification and characterisation of essential oils. , 2020, Talanta.

[5]  Justin Ferguson Reverse engineering code with IDA Pro , 2008 .

[6]  Leyla Bilge,et al.  Cutting the Gordian Knot: A Look Under the Hood of Ransomware Attacks , 2015, DIMVA.

[7]  Ali Dehghantanha,et al.  Leveraging Machine Learning Techniques for Windows Ransomware Network Traffic Detection , 2018, ArXiv.

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[10]  Oliver Michel,et al.  Machine Learning-Based Detection of Ransomware Using SDN , 2018, SDN-NFV@CODASPY.

[11]  Miss. Harshada U Salvi,et al.  Ransomware: A Cyber Extortion , 2016 .

[12]  Satoshi Fukumoto,et al.  Detecting Ransomware using Support Vector Machines , 2018, ICPP Workshops.

[13]  Engin Kirda,et al.  UNVEIL: A large-scale, automated approach to detecting ransomware (keynote) , 2016, SANER.

[14]  Nir Nissim,et al.  Trusted detection of ransomware in a private cloud using machine learning methods leveraging meta-features from volatile memory , 2018, Expert Syst. Appl..

[15]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[16]  Dipankar Dasgupta,et al.  Forensic Analysis of Ransomware Families Using Static and Dynamic Analysis , 2018, 2018 IEEE Security and Privacy Workshops (SPW).