AI assisted Malware Analysis: A Course for Next Generation Cybersecurity Workforce

The use of Artificial Intelligence (AI) and Machine Learning (ML) to solve cybersecurity problems has been gaining traction within industry and academia, in part as a response to widespread malware attacks on critical systems, such as cloud infrastructures, government offices or hospitals, and the vast amounts of data they generate. AI- and ML-assisted cybersecurity offers data-driven automation that could enable security systems to identify and respond to cyber threats in real time. However, there is currently a shortfall of professionals trained in AI and ML for cybersecurity. Here we address the shortfall by developing lab-intensive modules that enable undergraduate and graduate students to gain fundamental and advanced knowledge in applying AI and ML techniques to real-world datasets to learn about Cyber Threat Intelligence (CTI), malware analysis, and classification, among other important topics in cybersecurity. Here we describe six self-contained and adaptive modules in "AI-assisted Malware Analysis." Topics include: (1) CTI and malware attack stages, (2) malware knowledge representation and CTI sharing, (3) malware data collection and feature identification, (4) AI-assisted malware detection, (5) malware classification and attribution, and (6) advanced malware research topics and case studies such as adversarial learning and Advanced Persistent Threat (APT) detection.

[1]  David Waltermire,et al.  Guide to Cyber Threat Information Sharing , 2016 .

[2]  Todd R. Andel,et al.  Phase Space Detection of Virtual Machine Cyber Events Through Hypervisor-Level System Call Analysis , 2018, 2018 1st International Conference on Data Intelligence and Security (ICDIS).

[3]  Timothy W. Finin,et al.  Thinking, Fast and Slow: Combining Vector Spaces and Knowledge Graphs , 2017, ArXiv.

[4]  Rong Zeng,et al.  Malware detection based on ontology , 2017, 2017 International Conference on Machine Learning and Cybernetics (ICMLC).

[5]  Ankur Padia,et al.  UCO: A Unified Cybersecurity Ontology , 2016, AAAI Workshop: Artificial Intelligence for Cyber Security.

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Wenyi Huang,et al.  MtNet: A Multi-Task Neural Network for Dynamic Malware Classification , 2016, DIMVA.

[8]  Sudip Mittal,et al.  Analyzing CNN Based Behavioural Malware Detection Techniques on Cloud IaaS , 2020, CLOUD.

[9]  Shimei Pan,et al.  Learning Domain-Specific Word Embeddings from Sparse Cybersecurity Texts , 2017, ArXiv.

[10]  Karen A. Scarfone,et al.  Guide to Malware Incident Prevention and Handling for Desktops and Laptops , 2013 .

[11]  Ravi S. Sandhu,et al.  Clustering-Based IaaS Cloud Monitoring , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[12]  Konstantin Berlin,et al.  Deep neural network based malware detection using two dimensional binary program features , 2015, 2015 10th International Conference on Malicious and Unwanted Software (MALWARE).

[13]  Sayak Ray,et al.  Malware detection using machine learning based analysis of virtual memory access patterns , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[14]  Yuval Elovici,et al.  Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey , 2009, Inf. Secur. Tech. Rep..

[15]  Salvatore J. Stolfo,et al.  On the feasibility of online malware detection with performance counters , 2013, ISCA.

[16]  Ravi S. Sandhu,et al.  Online Malware Detection in Cloud Auto-scaling Systems Using Shallow Convolutional Neural Networks , 2019, DBSec.

[17]  Mansour Ahmadi,et al.  Microsoft Malware Classification Challenge , 2018, ArXiv.

[18]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[19]  Tim Finin,et al.  Creating Cybersecurity Knowledge Graphs From Malware After Action Reports , 2020, IEEE Access.

[20]  Francis Ferraro,et al.  CASIE: Extracting Cybersecurity Event Information from Text , 2020, AAAI.

[21]  Nael B. Abu-Ghazaleh,et al.  Malware-aware processors: A framework for efficient online malware detection , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[22]  Paulo Shakarian Dark-Web Cyber Threat Intelligence: From Data to Intelligence to Prediction , 2018, Inf..

[23]  Ankur Padia,et al.  UMBC at SemEval-2018 Task 8: Understanding Text about Malware , 2018, *SEMEVAL.

[24]  Anupam Joshi,et al.  Preventing Poisoning Attacks On AI Based Threat Intelligence Systems , 2018, 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP).

[25]  Timothy W. Finin,et al.  Cyber-All-Intel: An AI for Security related Threat Intelligence , 2019, ArXiv.

[26]  Vlado Keselj,et al.  N-gram-based detection of new malicious code , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[27]  Henry Dalziel,et al.  Cyber Kill Chain , 2015 .

[28]  Sattar Hashemi,et al.  ECFGM: enriched control flow graph miner for unknown vicious infected code detection , 2012, Journal in Computer Virology.

[29]  Anupam Joshi,et al.  Mining Threat Intelligence about Open-Source Projects and Libraries from Code Repository Issues and Bug Reports , 2018, 2018 IEEE International Conference on Intelligence and Security Informatics (ISI).

[30]  Haoyu Wang,et al.  Towards Light-Weight Deep Learning Based Malware Detection , 2018, 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC).

[31]  Timothy W. Finin,et al.  CyberTwitter: Using Twitter to generate alerts for cybersecurity threats and vulnerabilities , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[32]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[33]  Rui Wu,et al.  Ontology-based knowledge representation for malware individuals and families , 2019, Comput. Secur..

[34]  Jack W. Stokes,et al.  Robust Neural Malware Detection Models for Emulation Sequence Learning , 2018, MILCOM 2018 - 2018 IEEE Military Communications Conference (MILCOM).

[35]  Anupam Joshi,et al.  RelExt: Relation Extraction using Deep Learning approaches for Cybersecurity Knowledge Graph Improvement , 2019, 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[36]  Anupam Joshi,et al.  Using Deep Neural Networks to Translate Multi-lingual Threat Intelligence , 2018, 2018 IEEE International Conference on Intelligence and Security Informatics (ISI).

[37]  David A. Mundie,et al.  An Ontology for Malware Analysis , 2013, 2013 International Conference on Availability, Reliability and Security.

[38]  Tim Finin,et al.  Knowledge Enrichment by Fusing Representations for Malware Threat Intelligence and Behavior , 2020, 2020 IEEE International Conference on Intelligence and Security Informatics (ISI).

[39]  Ravi S. Sandhu,et al.  Malware Detection in Cloud Infrastructures Using Convolutional Neural Networks , 2018, 2018 IEEE 11th International Conference on Cloud Computing (CLOUD).

[40]  Jack W. Stokes,et al.  Large-scale malware classification using random projections and neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[41]  Jack W. Stokes,et al.  Malware classification with LSTM and GRU language models and a character-level CNN , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42]  David Hutchison,et al.  Malware Detection in Cloud Computing Infrastructures , 2016, IEEE Transactions on Dependable and Secure Computing.

[43]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[44]  Takeshi Yagi,et al.  Malware Detection with Deep Neural Network Using Process Behavior , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[45]  Giovanni Vigna,et al.  MalGene: Automatic Extraction of Malware Analysis Evasion Signature , 2015, CCS.

[46]  Lior Rokach,et al.  Mal-ID: Automatic Malware Detection Using Common Segment Analysis and Meta-Features , 2012, J. Mach. Learn. Res..

[47]  Joel A. Dawson,et al.  Neural Network Analysis of System Call Timing for Rootkit Detection , 2016, 2016 Cybersecurity Symposium (CYBERSEC).

[48]  Paul A. Watters,et al.  Zero-day Malware Detection based on Supervised Learning Algorithms of API call Signatures , 2011, AusDM.

[49]  Howon Kim,et al.  Visualized Malware Classification Based-on Convolutional Neural Network , 2016 .