A Novel Solutions for Malicious Code Detection and Family Clustering Based on Machine Learning

Malware has become a major threat to cyberspace security, not only because of the increasing complexity of malware itself, but also because of the continuously created and produced malicious code. In this paper, we propose two novel methods to solve the malware identification problem. One is to solve to malware classification. Different from traditional machine learning, our method introduces the ensemble models to solve the malware classification problem. The other is to solve malware family clustering. Different from the classic malware family clustering algorithm, our method introduces the t-SNE algorithm to visualize the feature data and then determines the number of malware families. The two proposed novel methods have been extensively tested on a large number of real-world malware samples. The results show that the first one is far superior to the existed individual models and the second one has a good adaptation ability. Our methods can be used for malicious code classification and family clustering, also with higher accuracy.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  Shen Su,et al.  Block-DEF: A secure digital evidence framework using blockchain , 2019, Inf. Sci..

[3]  Robert Gove,et al.  SEEM: a scalable visualization for comparing multiple large sets of attributes for malware analysis , 2014, VizSEC.

[4]  Jack W. Stokes,et al.  Large-scale malware classification using random projections and neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[6]  Konstantin Berlin,et al.  Deep neural network based malware detection using two dimensional binary program features , 2015, 2015 10th International Conference on Malicious and Unwanted Software (MALWARE).

[7]  Shen Su,et al.  Real-Time Lateral Movement Detection Based on Evidence Reasoning Network for Edge Computing Environment , 2019, IEEE Transactions on Industrial Informatics.

[8]  B. S. Manjunath,et al.  Malware images: visualization and automatic classification , 2011, VizSec '11.

[9]  Mohsen Guizani,et al.  A data-driven method for future Internet route decision modeling , 2019, Future Gener. Comput. Syst..

[10]  S. Sitharama Iyengar,et al.  A Survey on Malware Detection Using Data Mining Techniques , 2017, ACM Comput. Surv..

[11]  Xiaoxia Yin,et al.  A Real-Time Correlation of Host-Level Events in Cyber Range Service for Smart Campus , 2018, IEEE Access.

[12]  Marcus A. Maloof,et al.  Learning to detect malicious executables in the wild , 2004, KDD.

[13]  Carsten Willems,et al.  Automatic analysis of malware behavior using machine learning , 2011, J. Comput. Secur..

[14]  Cong Wang,et al.  Energy Efficient Data Collection in Large-Scale Internet of Things via Computation Offloading , 2019, IEEE Internet of Things Journal.

[15]  Robert Gove,et al.  Detecting malware samples with similar image sets , 2014, VizSEC.

[16]  Xiaojiang Du,et al.  A survey of key management schemes in wireless sensor networks , 2007, Comput. Commun..

[17]  Jens Myrup Pedersen,et al.  Analysis of Malware behavior: Type classification using machine learning , 2015, 2015 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA).

[18]  Lynn Batten,et al.  Classification of Malware Based on String and Function Feature Selection , 2010, 2010 Second Cybercrime and Trustworthy Computing Workshop.

[19]  Xiaojiang Du,et al.  Internet Protocol Television (IPTV): The Killer Application for the Next-Generation Internet , 2007, IEEE Communications Magazine.

[20]  Rajasekhar Mungara,et al.  A Routing-Driven Elliptic Curve Cryptography based Key Management Scheme for Heterogeneous Sensor Networks , 2014 .

[21]  Xiaoning Zhang,et al.  Forwarding Rule Multiplexing for Scalable SDN-Based Internet of Things , 2019, IEEE Internet of Things Journal.

[22]  Dawei Zhao,et al.  An efficient dynamic ID-based remote user authentication scheme using self-certified public keys for multi-server environments , 2018, PloS one.

[23]  Rakhi Sinha,et al.  Malware detection and classification based on extraction of API sequences , 2014, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[24]  Jinqiao Shi,et al.  Toward a Comprehensive Insight Into the Eclipse Attacks of Tor Hidden Services , 2019, IEEE Internet of Things Journal.

[25]  Christopher Krügel,et al.  Dynamic Analysis of Malicious Code , 2006, Journal in Computer Virology.

[26]  Joshua Saxe,et al.  Visualization of shared system call sequence relationships in large malware corpora , 2012, VizSec '12.

[27]  Xiaojiang Du,et al.  A Distributed Deep Learning System for Web Attack Detection on Edge Devices , 2020, IEEE Transactions on Industrial Informatics.

[28]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[29]  Vinod Yegneswaran,et al.  A comparative assessment of malware classification using binary texture analysis and dynamic analysis , 2011, AISec '11.

[30]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[31]  KyoungSoo Han,et al.  Malware Analysis Using Visualized Image Matrices , 2014, TheScientificWorldJournal.

[32]  Lior Rokach,et al.  Novel active learning methods for enhanced PC malware detection in windows OS , 2014, Expert Syst. Appl..

[33]  Mohsen Guizani,et al.  Evaluating Reputation Management Schemes of Internet of Vehicles Based on Evolutionary Game Theory , 2019, IEEE Transactions on Vehicular Technology.

[34]  Mohsen Guizani,et al.  An effective key management scheme for heterogeneous sensor networks , 2007, Ad Hoc Networks.

[35]  Xiaojiang Du,et al.  Security in wireless sensor networks , 2008, IEEE Wireless Communications.

[36]  Dragos Gavrilut,et al.  Malware Detection Using Perceptrons and Support Vector Machines , 2009, 2009 Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns.