A Multi-Perspective malware detection approach through behavioral fusion of API call sequence

Abstract The widespread development of the malware industry is considered the main threat to our e-society. Therefore, malware analysis should also be enriched with smart heuristic tools that recognize malicious behaviors effectively. Although the generated API calling graph representation for malicious processes encodes worthwhile information about their malicious behavior, it is pragmatically inconvenient to generate a behavior graph for each process. Therefore, we experimented with creating generic behavioral graph models that describe malicious and non-malicious processes. These behavioral models relied on the fusion of statistical, contextual, and graph mining features that capture explicit and implicit relationships between API functions in the calling sequence. Our generated behavioral models proved the behavioral contrast between malicious and non-malicious calling sequences. According to that distinction, we built different relational perspective models that characterize processes’ behaviors. To prove our approach novelty, we experimented with our approach over Windows and Android platforms. Our experimentations demonstrated that our proposed system identified unseen malicious samples with high accuracy with low false-positive. In terms of detection accuracy, our model returns an average accuracy of 0.997 and 0.977 to the unseen Windows and Android malware testing samples, respectively. Moreover, we proposed a new indexing method for APIs based on their contextual similarities. We also suggested a new expressive, a visualized form that renders the API calling sequence. Consequently, we introduced a confidence metric to our model classification decision. Furthermore, we developed a behavioral heuristic that effectively identified malicious API call sequences that were deceptive or mimicry.

[1]  Qinghua Zheng,et al.  Android Malware Familial Classification and Representative Sample Selection via Frequent Subgraph Analysis , 2018, IEEE Transactions on Information Forensics and Security.

[2]  Jinshu Su,et al.  Generating Lightweight Behavioral Signature for Malware Detection in People-Centric Sensing , 2014, Wirel. Pers. Commun..

[3]  Youssef B. Mahdy,et al.  Behavior-based features model for malware detection , 2016, Journal of Computer Virology and Hacking Techniques.

[4]  Sattar Hashemi,et al.  Malware detection based on mining API calls , 2010, SAC '10.

[5]  Jong Wan Hu,et al.  Contextual Identification of Windows Malware through Semantic Interpretation of API Call Sequence , 2020 .

[6]  B D Satoto,et al.  Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster , 2018, IOP Conference Series: Materials Science and Engineering.

[7]  Danny Hendler,et al.  Scalable Detection of Server-Side Polymorphic Malware , 2018, Knowl. Based Syst..

[8]  Zhe Chen,et al.  Securing IoT Space via Hardware Trojan Detection , 2020, IEEE Internet of Things Journal.

[9]  Ivan Zelinka,et al.  A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence , 2020, Comput. Secur..

[10]  David Menotti,et al.  The Need for Speed: An Analysis of Brazilian Malware Classifers , 2018, IEEE Security & Privacy.

[11]  Shanqing Guo,et al.  Integration of Multi-modal Features for Android Malware Detection Using Linear SVM , 2016, 2016 11th Asia Joint Conference on Information Security (AsiaJCIS).

[12]  Nael B. Abu-Ghazaleh,et al.  Ensemble Learning for Low-Level Hardware-Supported Malware Detection , 2015, RAID.

[13]  Kevin Jones,et al.  Early Stage Malware Prediction Using Recurrent Neural Networks , 2017, Comput. Secur..

[14]  Khaled M. Fouad,et al.  Keyphrase extraction methodology from short abstracts of medical documents , 2016, 2016 8th Cairo International Biomedical Engineering Conference (CIBEC).

[15]  Sebastian Raschka,et al.  Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning , 2018, ArXiv.

[16]  Tankut Acarman,et al.  Malware classification based on API calls and behaviour analysis , 2017, IET Inf. Secur..

[17]  Yuanzhang Li,et al.  Deep learning feature exploration for Android malware detection , 2021, Appl. Soft Comput..

[18]  Wenyi Huang,et al.  MtNet: A Multi-Task Neural Network for Dynamic Malware Classification , 2016, DIMVA.

[19]  Arun Kumar Sangaiah,et al.  Bio-inspired computational paradigm for feature investigation and malware detection: interactive analytics , 2018, Multimedia Tools and Applications.

[20]  Danny Hendler,et al.  Detection of malicious webmail attachments based on propagation patterns , 2018, Knowl. Based Syst..

[21]  Yan Song,et al.  An end-to-end model for Android malware detection , 2017, 2017 IEEE International Conference on Intelligence and Security Informatics (ISI).

[22]  Aziz Mohaisen,et al.  Analyzing and Detecting Emerging Internet of Things Malware: A Graph-Based Approach , 2019, IEEE Internet of Things Journal.

[23]  Renato José Sassi,et al.  Behavioral Malware Detection Using Deep Graph Convolutional Neural Networks , 2019 .

[24]  Jie He,et al.  Analyzing Malware by Abstracting the Frequent Itemsets in API Call Sequences , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[25]  Bezawada Bruhadeshwar,et al.  Signature Generation and Detection of Malware Families , 2008, ACISP.

[26]  Kim-Kwang Raymond Choo,et al.  An Ensemble Intrusion Detection Technique Based on Proposed Statistical Flow Features for Protecting Network Traffic of Internet of Things , 2019, IEEE Internet of Things Journal.

[27]  Javed Ahmed,et al.  Deep learning based Sequential model for malware analysis using Windows exe API Calls , 2020, PeerJ Comput. Sci..

[28]  Yi Sun,et al.  Malware Detection Based on Deep Learning of Behavior Graphs , 2019, Mathematical Problems in Engineering.

[29]  G. Aghila,et al.  A learning model to detect maliciousness of portable executable using integrated feature set , 2017, J. King Saud Univ. Comput. Inf. Sci..

[30]  Andreas Müller,et al.  Introduction to Machine Learning with Python: A Guide for Data Scientists , 2016 .

[31]  Richard A. Davis,et al.  Maximum likelihood estimation for all-pass time series models , 2006 .

[32]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[33]  Mark Stamp,et al.  Detecting malware evolution using support vector machines , 2020, Expert Syst. Appl..

[34]  Ali Dehghantanha,et al.  Detecting crypto-ransomware in IoT networks based on energy consumption footprint , 2018, J. Ambient Intell. Humaniz. Comput..

[35]  Debojyoti Dutta,et al.  MIGAN: Malware Image Synthesis Using GANs , 2019, AAAI.

[36]  Md. Rafiqul Islam,et al.  Differentiating malware from cleanware using behavioural analysis , 2010, 2010 5th International Conference on Malicious and Unwanted Software.

[37]  Ljupco Todorovski,et al.  The Influence of Feature Representation of Text on the Performance of Document Classification , 2017, Applied Sciences.

[38]  Hadi Veisi,et al.  Sentiment analysis based on improved pre-trained word embeddings , 2019, Expert Syst. Appl..

[39]  John Cavazos,et al.  HADM: Hybrid Analysis for Detection of Malware , 2016, IntelliSys.

[40]  Kevin Jones,et al.  Malware classification using self organising feature maps and machine activity data , 2018, Comput. Secur..

[41]  James H. Martin,et al.  Speech and Language Processing, 2nd Edition , 2008 .

[42]  Yanfang Ye,et al.  Out-of-sample Node Representation Learning for Heterogeneous Graph in Real-time Android Malware Detection , 2019, IJCAI.

[43]  Nagarathna Ravi,et al.  Semisupervised-Learning-Based Security to Detect and Mitigate Intrusions in IoT Network , 2020, IEEE Internet of Things Journal.

[44]  Fabio Roli,et al.  Security Evaluation of Pattern Classifiers under Attack , 2014, IEEE Transactions on Knowledge and Data Engineering.

[45]  Michael Pradel,et al.  Anything to Hide? Studying Minified and Obfuscated Code in the Web , 2019, WWW.

[46]  Ali Dehghantanha,et al.  An opcode‐based technique for polymorphic Internet of Things malware detection , 2020, Concurr. Comput. Pract. Exp..

[47]  Ali Dehghantanha,et al.  Exploit Kits: The production line of the Cybercrime economy? , 2015, 2015 Second International Conference on Information Security and Cyber Forensics (InfoSec).

[48]  Aliaa A. A. Youssif,et al.  HSWS: enhancing efficiency of web search engine via semantic web , 2011, MEDES.

[49]  Yanfang Ye,et al.  Deep Neural Networks for Automatic Android Malware Detection , 2017, 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[50]  Claudia Eckert,et al.  Deep Learning for Classification of Malware System Call Sequences , 2016, Australasian Conference on Artificial Intelligence.

[51]  Kangbin Yim,et al.  Malware Obfuscation Techniques: A Brief Survey , 2010, 2010 International Conference on Broadband, Wireless Computing, Communication and Applications.

[52]  Murat Aydos,et al.  A review on cyber security datasets for machine learning algorithms , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[53]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[54]  Hiroshi Sato,et al.  NLP-based approaches for malware classification from API sequences , 2017, 2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES).

[55]  Francesco Palmieri,et al.  Malware detection in mobile environments based on Autoencoders and API-images , 2020, J. Parallel Distributed Comput..

[56]  Paul A. Watters,et al.  Zero-day Malware Detection based on Supervised Learning Algorithms of API call Signatures , 2011, AusDM.

[57]  David Camacho,et al.  CANDYMAN: Classifying Android malware families by modelling dynamic traces with Markov chains , 2018, Eng. Appl. Artif. Intell..

[58]  Sheng Chen,et al.  A malware detection method based on family behavior graph , 2018, Comput. Secur..

[59]  Razvan Pascanu,et al.  Malware classification with recurrent networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[60]  Christian Esposito,et al.  Metamorphic malicious code behavior detection using probabilistic inference methods , 2019, Cognitive Systems Research.

[61]  Bingcai Chen,et al.  End-to-end malware detection for android IoT devices using deep learning , 2020, Ad Hoc Networks.

[62]  Wu Liu,et al.  Behavior-Based Malware Analysis and Detection , 2011, 2011 First International Workshop on Complexity and Data Mining.

[63]  Xing Chen,et al.  DroidDet: Effective and robust detection of android malware using static analysis along with rotation forest model , 2018, Neurocomputing.

[64]  Juan E. Tapiador,et al.  The MalSource Dataset: Quantifying Complexity and Code Reuse in Malware Development , 2018, IEEE Transactions on Information Forensics and Security.

[65]  Arwa Alrawais,et al.  FlowGuard: An Intelligent Edge Defense Mechanism Against IoT DDoS Attacks , 2020, IEEE Internet of Things Journal.

[66]  Channamma Patil,et al.  Estimating the Optimal Number of Clusters k in a Dataset Using Data Depth , 2019, Data Science and Engineering.

[67]  Hitoshi Iyatomi,et al.  One-dimensional convolutional neural networks for Android malware detection , 2018, 2018 IEEE 14th International Colloquium on Signal Processing & Its Applications (CSPA).

[68]  Yong Qi,et al.  Detecting Malware with an Ensemble Method Based on Deep Neural Network , 2018, Secur. Commun. Networks.

[69]  Hiromu Yakura,et al.  Neural malware analysis with attention mechanism , 2019, Comput. Secur..

[70]  David Camacho,et al.  Android malware detection through hybrid features fusion and ensemble classifiers: The AndroPyTool framework and the OmniDroid dataset , 2019, Inf. Fusion.

[71]  Chan Woo Kim,et al.  NtMalDetect: A Machine Learning Approach to Malware Detection Using Native API System Calls , 2018, ArXiv.

[72]  Lior Rokach,et al.  Dynamic Malware Analysis in the Modern Era—A State of the Art Survey , 2019, ACM Comput. Surv..

[73]  Mohammed Meknassi,et al.  Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning , 2019, Expert Syst. Appl..

[74]  Eslam Amer,et al.  Enhancing Semantic Arabic Information Retrieval via Arabic Wikipedia Assisted Search Expansion Layer , 2017, AISI.

[75]  Sakir Sezer,et al.  DL-Droid: Deep learning based android malware detection using real devices , 2019, Comput. Secur..

[76]  Yanfang Ye,et al.  IMDS: intelligent malware detection system , 2007, KDD '07.

[77]  Roberto Baldoni,et al.  Survey on the Usage of Machine Learning Techniques for Malware Analysis , 2017, Comput. Secur..

[78]  Long Nguyen-Vu,et al.  Android Fragmentation in Malware Detection , 2019, Comput. Secur..

[79]  Ali Dehghantanha,et al.  Machine Learning Aided Static Malware Analysis: A Survey and Tutorial , 2018, ArXiv.

[80]  Yang Liu,et al.  Apk2vec: Semi-Supervised Multi-view Representation Learning for Profiling Android Applications , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[81]  Abdelouahid Derhab,et al.  MalDozer: Automatic framework for android malware detection using deep learning , 2018, Digit. Investig..

[82]  Carsten Willems,et al.  Automatic analysis of malware behavior using machine learning , 2011, J. Comput. Secur..

[83]  Saeed Parsa,et al.  Analysis and classification of context-based malware behavior , 2019, Comput. Commun..

[84]  Ivan Zelinka,et al.  An Ensemble-Based Malware Detection Model Using Minimum Feature Set , 2019 .

[85]  Daniel Gibert,et al.  The rise of machine learning for detection and classification of malware: Research developments, trends and challenges , 2020, J. Netw. Comput. Appl..

[86]  Yongxin Feng,et al.  A Feature Extraction Method of Hybrid Gram for Malicious Behavior Based on Machine Learning , 2019, Secur. Commun. Networks.

[87]  Jie He,et al.  CBM: Free, Automatic Malware Analysis Framework Using API Call Sequences , 2014 .

[88]  Eslam Amer,et al.  Enhancing Efficiency of Web Search Engines through Ontology Learning from Unstructured Information Sources , 2015, 2015 IEEE International Conference on Information Reuse and Integration.

[89]  P. V. Shijo,et al.  Integrated Static and Dynamic Analysis for Malware Detection , 2015 .

[90]  Muhammad Zubair Shafiq,et al.  Using spatio-temporal information in API calls with machine learning algorithms for malware detection , 2009, AISec '09.

[91]  Massimo Ficco,et al.  Comparing API Call Sequence Algorithms for Malware Detection , 2020, AINA Workshops.

[92]  Eunjin Kim,et al.  A Novel Approach to Detect Malware Based on API Call Sequence Analysis , 2015, Int. J. Distributed Sens. Networks.

[93]  Hai Jin,et al.  Graph Processing on GPUs , 2018, ACM Comput. Surv..

[94]  Giorgio Giacinto,et al.  Towards Adversarial Malware Detection , 2018, ACM Comput. Surv..

[95]  Kui Ren,et al.  Towards Privacy-Preserving Malware Detection Systems for Android , 2018, 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS).

[96]  Jun Guo,et al.  Partial Multi-View Outlier Detection Based on Collective Learning , 2018, AAAI.

[97]  Zhenlong Yuan,et al.  DroidDetector: Android Malware Characterization and Detection Using Deep Learning , 2016 .

[98]  Ashkan Sami,et al.  MAAR: Robust features to detect malicious activity based on API calls, their arguments and return values , 2017, Eng. Appl. Artif. Intell..

[99]  Lior Rokach,et al.  Improving malware detection by applying multi-inducer ensemble , 2009, Comput. Stat. Data Anal..

[100]  Eslam Amer,et al.  AKEA: An Arabic Keyphrase Extraction Algorithm , 2016, AISI.

[101]  Xiaolei Wang,et al.  A Novel Android Malware Detection Approach Based on Convolutional Neural Network , 2018, ICCSP.

[102]  K. P. Soman,et al.  Robust Intelligent Malware Detection Using Deep Learning , 2019, IEEE Access.

[103]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[104]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[105]  S. Sitharama Iyengar,et al.  A Survey on Malware Detection Using Data Mining Techniques , 2017, ACM Comput. Surv..

[106]  Wenjia Li,et al.  DroidDeepLearner: Identifying Android malware using deep learning , 2016, 2016 IEEE 37th Sarnoff Symposium.