A Study on Labeling Network Hostile Behavior with Intelligent Interactive Tools

Labeling a real network dataset is specially expensive in computer security, as an expert has to ponder several factors before assigning each label. This paper describes an interactive intelligent system to support the task of identifying hostile behaviors in network logs. The RiskID application uses visualizations to graphically encode features of network connections and promote visual comparison. In the background, two algorithms are used to actively organize connections and predict potential labels: a recommendation algorithm and a semi-supervised learning strategy. These algorithms together with interactive adaptions to the user interface constitute a behavior recommendation. A study is carried out to analyze how the algorithms for recommendation and prediction influence the workflow of labeling a dataset. The results of a study with 16 participants indicate that the behaviour recommendation significantly improves the quality of labels. Analyzing interaction patterns, we identify a more intuitive workflow used when behaviour recommendation is available.

[1]  Ali A. Ghorbani,et al.  Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization , 2018, ICISSP.

[2]  Jennifer Rexford,et al.  WebClass: adding rigor to manual labeling of traffic anomalies , 2008, CCRV.

[3]  Aiko Pras,et al.  A Labeled Data Set for Flow-Based Intrusion Detection , 2009, IPOM.

[4]  Ali A. Ghorbani,et al.  An Evaluation Framework for Intrusion Detection Dataset , 2016, 2016 International Conference on Information Science and Security (ICISS).

[5]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[6]  S. Hart,et al.  Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research , 1988 .

[7]  Maya Cakmak,et al.  Power to the People: The Role of Humans in Interactive Machine Learning , 2014, AI Mag..

[8]  Francis Bach,et al.  ILAB: An Interactive Labelling Strategy for Intrusion Detection , 2017, RAID.

[9]  Carlos García Garino,et al.  Automatic network intrusion detection: Current techniques and open issues , 2012, Comput. Electr. Eng..

[10]  José M. Fernandez,et al.  Providing SCADA Network Data Sets for Intrusion Detection Research , 2016, CSET @ USENIX Security Symposium.

[11]  Trupti M. Kodinariya,et al.  Review on determining number of Cluster in K-Means Clustering , 2013 .

[12]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[13]  Jugal K. Kalita,et al.  Towards Generating Real-life Datasets for Network Intrusion Detection , 2015, Int. J. Netw. Secur..

[14]  Marius Kloft,et al.  Active learning for network intrusion detection , 2009, AISec '09.

[15]  Jürgen Bernard,et al.  VIAL: a unified process for visual interactive labeling , 2018, The Visual Computer.

[16]  Sachin Shetty,et al.  Generation of Labelled Datasets to Quantify the Impact of Security Threats to Cloud Data Centers , 2016 .

[17]  Pramod K. Varshney,et al.  Why Interpretability in Machine Learning? An Answer Using Distributed Detection and Data Fusion Theory , 2018, ArXiv.

[18]  Alexei Sourin,et al.  Interactive labelling of a multivariate dataset for supervised machine learning using linked visualisations, clustering, and active learning , 2019, Vis. Informatics.

[19]  Sebastian Abt,et al.  Are We Missing Labels? A Study of the Availability of Ground-Truth in Network Security Research , 2014, 2014 Third International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS).