BlackEye: automatic IP blacklisting using machine learning from security logs

Blacklisting of malicious IP address is a primary technique commonly used for safeguarding mission-critical IT systems. The decision to blacklist an IP address requires careful examination of various aspects of packet traffic data as well as the behavioral history. Most of the current security monitoring for IP blacklisting heavily relies on the domain expertise from experienced specialists. Although there are efforts to apply machine-learning (ML) techniques to this problem, we are yet to see the mature solution. To mitigate these challenges and to gain better understanding of the problem, we have designed the BlackEye framework in which we can apply various ML techniques and produce models for accurate blacklisting. From our analysis results, we learn that multi-staged method that combines the data cleansing and the classification via logistic regression or random forest produces the best results. Our evaluation on the real-world data shows that it can reduce the the incorrect blacklisting by nearly 90% when compared to the performance of experts. More over, our proposed model performed well in terms of the time-to-blacklist by curtailing the period of malicious IP address in activity by 27 days on average.

[1]  Feifei Li,et al.  DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning , 2017, CCS.

[2]  Phillip A. Porras,et al.  Highly Predictive Blacklisting , 2008, USENIX Security Symposium.

[3]  Shanchieh Jay Yang,et al.  Projecting Cyberattacks Through Variable-Length Markov Models , 2008, IEEE Transactions on Information Forensics and Security.

[4]  Kalyan Veeramachaneni,et al.  Acquire, adapt, and anticipate: continuous learning to block malicious domains , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[5]  Tansel Dökeroglu,et al.  Context-sensitive and keyword density-based supervised machine learning techniques for malicious webpage detection , 2018, Soft Computing.

[6]  Huimin Lu,et al.  Deep adversarial metric learning for cross-modal retrieval , 2019, World Wide Web.

[7]  Wenke Lee,et al.  A Data Mining Framework for Constructing Features and Models for Intrusion Detection Systems , 1999 .

[8]  Huimin Lu,et al.  Brain Intelligence: Go beyond Artificial Intelligence , 2017, Mobile Networks and Applications.

[9]  Baris Coskun,et al.  (Un)wisdom of Crowds: Accurately Spotting Malicious IP Clusters Using Not-So-Accurate IP Blacklists , 2017, IEEE Transactions on Information Forensics and Security.

[10]  Tanmoy Chakraborty,et al.  Phishing URL Detection with Oversampling based on Text Generative Adversarial Networks , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[11]  Emiliano De Cristofaro,et al.  On collaborative predictive blacklisting , 2018, CCRV.

[12]  Brian Hutchinson,et al.  Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams , 2017, AAAI Workshops.

[13]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[14]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[15]  Huimin Lu,et al.  FDCNet: filtering deep convolutional network for marine organism classification , 2018, Multimedia Tools and Applications.

[16]  Sumeet Dua,et al.  Data Mining and Machine Learning in Cybersecurity , 2011 .

[17]  William K. Robertson,et al.  Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks , 2013, ACSAC.

[18]  Athina Markopoulou,et al.  Predictive Blacklisting as an Implicit Recommendation System , 2009, 2010 Proceedings IEEE INFOCOM.

[19]  Steven C. H. Hoi,et al.  Malicious URL Detection using Machine Learning: A Survey , 2017, ArXiv.

[20]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[21]  Peng Yang,et al.  Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning , 2019, IEEE Access.

[22]  Huimin Lu,et al.  Low illumination underwater light field images reconstruction using deep convolutional neural networks , 2018, Future Gener. Comput. Syst..

[23]  Huimin Lu,et al.  Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval , 2020, IEEE Transactions on Cybernetics.

[24]  Huimin Lu,et al.  Motor Anomaly Detection for Unmanned Aerial Vehicles Using Reinforcement Learning , 2018, IEEE Internet of Things Journal.

[25]  Adam Carlson,et al.  Modeling network intrusion detection alerts for correlation , 2007, ACM Trans. Inf. Syst. Secur..