Poisoning Attacks and Data Sanitization Mitigations for Machine Learning Models in Network Intrusion Detection Systems

Among many application domains of machine learning in real-world settings, cyber security can benefit from more automated techniques to combat sophisticated adversaries. Modern network intrusion detection systems leverage machine learning models on network logs to proactively detect cyber attacks. However, the risk of adversarial attacks against machine learning used in these cyber settings is not fully explored. In this paper, we investigate poisoning attacks at training time against machine learning models in constrained cyber environments such as network intrusion detection; we also explore mitigations of such attacks based on training data sanitization. We consider the setting of poisoning availability attacks, in which an attacker can insert a set of poisoned samples at training time with the goal of degrading the accuracy of the deployed model. We design a white-box, realizable poisoning attack that reduced the original model accuracy from 95% to less than 50 % by generating mislabeled samples in close vicinity of a selected subset of training points. We also propose a novel Nested Training method as a defense against these attacks. Our defense includes a diversified ensemble of classifiers, each trained on a different subset of the training set. We use the disagreement of the classifiers' predictions as a data sanitization method, and show that an ensemble of 10 SVM classifiers is resilient to a large fraction of poisoning samples, up to 30% of the training data.