Investigation of AWSCTD dataset applicability for malware type classification

Nowadays, information systems security is a crucial aspect – vulnerable system endpoint can lead to severe data loss. Intrusion detection systems (IDS) are used to detect such unfortunate events. Implementation place defines the type of IDS: network-based (NIDS) for network traffic monitoring or host-based (HIDS), to detect malicious actions on the host level. IDS can be effective only if generated alerts are correctly evaluated and classified, what is typically done by a trained staff, but requires a lot of time and human resources. While a lot research is done with NIDS alerts evaluation, HIDS research is lacking behind. HIDS reported operating system calls could be used to define the importance of alarms and steer analysts to the most critical issues. In this article we demonstrate the applicability of our created Attack-Caused Windows System Calls Traces Dataset (AWSCTD), which is currently the most comprehensive dataset of system calls generated by almost all modern malware types, for training different classification methods on malware type recognition and later alert prioritization. The effectiveness of different classification methods is evaluated, and results are presented. Currently achieved results allow to decrease the load on analytical staff, dealing with malware classification and related alert prioritization by 92.4%, which makes this approach applicable for practical use.

[1]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[2]  Jiankun Hu,et al.  Windows Based Data Sets for Evaluation of Robustness of Host Based Intrusion Detection Systems (IDS) to Zero-Day and Stealth Attacks , 2016, Future Internet.

[3]  Lalu Banoth,et al.  A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection , 2017 .

[4]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[5]  Muttukrishnan Rajarajan,et al.  Intrusion alert prioritisation and attack detection using post-correlation analysis , 2015, Comput. Secur..

[6]  Tarrah R. Glass-Vanderlan,et al.  A Survey of Intrusion Detection Systems Leveraging Host Data , 2018, ACM Comput. Surv..

[7]  Alfonso Valdes,et al.  A Mission-Impact-Based Approach to INFOSEC Alarm Correlation , 2002, RAID.

[8]  Nikolaj Goranin,et al.  Towards a Robust Method of Dataset Generation of Malicious Activity for Anomaly-Based HIDS Training and Presentation of AWSCTD Dataset , 2018, Balt. J. Mod. Comput..

[9]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[10]  Humphrey Waita Njogu,et al.  An efficient approach to reduce alerts generated by multiple IDS products , 2014, Int. J. Netw. Manag..

[11]  Christopher Krügel,et al.  Comprehensive approach to intrusion detection alert correlation , 2004, IEEE Transactions on Dependable and Secure Computing.

[12]  Hervé Debar,et al.  M2D2: A Formal Data Model for IDS Alert Correlation , 2002, RAID.

[13]  Ehab Al-Shaer,et al.  Alert prioritization in Intrusion Detection Systems , 2008, NOMS 2008 - 2008 IEEE Network Operations and Management Symposium.

[14]  D. Guan,et al.  ANOMALY NETWORK INTRUSION DETECTION USING HIDDEN MARKOV MODEL , 2016 .

[15]  James P Anderson,et al.  Computer Security Technology Planning Study , 1972 .

[16]  Youness Idrissi Khamlichi,et al.  Building an Efficient Alert Management Model for Intrusion Detection Systems , 2018 .

[17]  Claudia Eckert,et al.  Deep Learning for Classification of Malware System Call Sequences , 2016, Australasian Conference on Artificial Intelligence.