Malware Analytics: Review of Data Mining, Machine Learning and Big Data Perspectives

Recent advances in cyber technologies have made human life’s easier, but it may lead to a heavy cost in terms of economic, psychological or reputation damage. For instance, these damages may be caused by variants of malware propagated in a hidden and mostly untraceable way. Malware analytics deals with the approaches and techniques utilized to generate the distinguishing characteristics of the malware for robust cyber defenses. This paper aims at presenting the current status of the malware research, challenges, and methods used to overcome those challenges using data mining, machine learning and big data perspectives. We have considered these three perspectives because of its extensive computation value, mostly fused to solve a wide range of problems from security to medical, finance and industry. These domains as an independent technique and their interrelationships depend on the nature of the dataset considered. We have also proposed a framework to overcome the challenges and open issues prevalent in malware analytics. It is hoped that this paper with the simplified presentation of the most vital approaches of malware analytics will help the inspiring researcher or a newbie in the security field to explore more as well as budding engineers to choose malware analysis as their field of study. Specifically, analysis of state-of-the-art approaches with evaluation, pros and cons discussion and the current challenges and future directions will empower all the malware enthusiasts.

[1]  Michal Choras,et al.  Cost-Sensitive Distributed Machine Learning for NetFlow-Based Botnet Activity Detection , 2018, Secur. Commun. Networks.

[2]  Claudia Eckert,et al.  Adversarial Malware Binaries: Evading Deep Learning for Malware Detection in Executables , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[3]  Witawas Srisa-an,et al.  Significant Permission Identification for Machine-Learning-Based Android Malware Detection , 2018, IEEE Transactions on Industrial Informatics.

[4]  Shih-Hao Hung,et al.  DroidDolphin: a dynamic Android malware detection framework using big data and machine learning , 2014, RACS '14.

[5]  Yanfang Ye,et al.  HinDroid: An Intelligent Android Malware Detection System Based on Structured Heterogeneous Information Network , 2017, KDD.

[6]  H. Anderson,et al.  Evading Machine Learning Malware Detection , 2017 .

[7]  Dipankar Dasgupta,et al.  A Framework for Analyzing Ransomware using Machine Learning , 2018, 2018 IEEE Symposium Series on Computational Intelligence (SSCI).

[8]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[9]  R. Vinayakumar,et al.  DeepMalNet: Evaluating shallow and deep networks for static PE malware detection , 2018, ICT Express.

[10]  Ali Dehghantanha,et al.  Robust Malware Detection for Internet of (Battlefield) Things Devices Using Deep Eigenspace Learning , 2019, IEEE Transactions on Sustainable Computing.

[11]  Konstantin Berlin,et al.  Deep neural network based malware detection using two dimensional binary program features , 2015, 2015 10th International Conference on Malicious and Unwanted Software (MALWARE).

[12]  Dimitris Gritzalis,et al.  Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software , 2012, Comput. Secur..

[13]  Azizur Rahman,et al.  Malware analysis and detection using data mining and machine learning classification , 2017 .

[14]  Ding Yuxin,et al.  Malware detection based on deep learning algorithm , 2017, Neural Computing and Applications.

[15]  Bhavani M. Thuraisingham,et al.  Malware Collection and Analysis , 2017, 2017 IEEE International Conference on Information Reuse and Integration (IRI).

[16]  Eric Medvet,et al.  Impact of Code Obfuscation on Android Malware Detection based on Static and Dynamic Analysis , 2018, ICISSP.

[17]  Wei Wang,et al.  Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network , 2019, J. Ambient Intell. Humaniz. Comput..

[18]  Bali Devi,et al.  Mobile Big Data: Malware and Its Analysis , 2018 .

[19]  Yaohang Li,et al.  Malware Sequence Alignment , 2016, 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom).

[20]  Yanfang Ye,et al.  DL 4 MD : A Deep Learning Framework for Intelligent Malware Detection , 2016 .

[21]  Yi Sun,et al.  Malware Detection Based on Deep Learning of Behavior Graphs , 2019, Mathematical Problems in Engineering.

[22]  Tariq Mahmood,et al.  Security Analytics: Big Data Analytics for cybersecurity: A review of trends, techniques and tools , 2013, 2013 2nd National Conference on Information Assurance (NCIA).

[23]  Dan Chia-Tien Lo,et al.  Feature Selection and Improving Classification Performance for Malware Detection , 2016, 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom).

[24]  Yanfang Ye,et al.  Malicious sequential pattern mining for automatic malware detection , 2016, Expert Syst. Appl..

[25]  Jinjun Chen,et al.  Detection of Malicious Code Variants Based on Deep Learning , 2018, IEEE Transactions on Industrial Informatics.

[26]  Claudia Eckert,et al.  Deep Learning for Classification of Malware System Call Sequences , 2016, Australasian Conference on Artificial Intelligence.