AI-Driven Cybersecurity: An Overview, Security Intelligence Modeling and Research Directions

Artificial Intelligence (AI) is one of the key technologies of the Fourth Industrial Revolution (Industry 4.0), which can be used for the protection of Internet-connected systems from cyber-threats, attacks, damage, or unauthorized access. To intelligently solve today’s various cybersecurity issues, popular AI techniques involving Machine Learning (ML) and Deep Learning (DL) methods, the concept of Natural Language Processing (NLP), Knowledge Representation and Reasoning (KRR), as well as the concept of knowledge or rule-based Expert Systems (ES) modeling can be used. Based on these AI methods, in this paper, we present a comprehensive view on “AI-driven Cybersecurity” that can play an important role for intelligent cybersecurity services and management. The security intelligence modeling based on such AI methods can make the cybersecurity computing process automated and intelligent than the conventional security systems. We also highlight several research directions within the scope of our study, which can help researchers do future research in the area. Overall, this paper’s ultimate objective is to serve as a reference point and guidelines for cybersecurity researchers as well as industry professionals in the area, especially from an AI-based technical point of view.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[3]  Jun Feng,et al.  Network Attacks Detection Methods Based on Deep Learning Techniques: A Survey , 2020, Secur. Commun. Networks.

[4]  Marc Ohm,et al.  Towards detection of software supply chain attacks by forensic artifacts , 2020, ARES.

[5]  Chunhua Wang,et al.  Machine Learning and Deep Learning Methods for Cybersecurity , 2018, IEEE Access.

[6]  Thomas Eisenbarth,et al.  MemJam: A False Dependency Attack Against Constant-Time Crypto Implementations , 2017, International Journal of Parallel Programming.

[7]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[8]  Pirooz Shamsinejad,et al.  Intrusion Detection using a Novel Hybrid Method Incorporating an Improved KNN , 2017 .

[9]  Toby Velte,et al.  Cloud Computing, A Practical Approach , 2009 .

[10]  Gal Egozi,et al.  Phishing Email Detection Using Robust NLP Techniques , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[11]  Dieter Hogrefe,et al.  A Novel Semi-Supervised Adaboost Technique for Network Anomaly Detection , 2016, MSWiM.

[12]  Mohammad Javad Golkar,et al.  A hybrid method consisting of GA and SVM for intrusion detection system , 2016, Neural Computing and Applications.

[13]  Kurt C. Wallnau,et al.  Generating Test Data for Insider Threat Detectors , 2014, J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl..

[14]  Michalis Faloutsos,et al.  SUT: Quantifying and mitigating URL typosquatting , 2011, Comput. Networks.

[15]  SunYunchuan,et al.  A framework for cloud forensics evidence collection and analysis using security information and event management , 2016 .

[16]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[17]  Luiz Eduardo Soares de Oliveira,et al.  Towards an Energy-Efficient Anomaly-Based Intrusion Detection Engine for Embedded Systems , 2017, IEEE Transactions on Computers.

[18]  Peter Luksch,et al.  Data Science Methodology for Cybersecurity Projects , 2018, ArXiv.

[19]  Juan E. Tapiador,et al.  Key-Recovery Attacks on KIDS, a Keyed Anomaly Detection System , 2015, IEEE Transactions on Dependable and Secure Computing.

[20]  Julian Jang,et al.  The Inadequacy of Entropy-Based Ransomware Detection , 2019, ICONIP.

[21]  Roberto Blanco,et al.  Applying Cost-Sensitive Classifiers with Reinforcement Learning to IDS , 2018, IDEAL.

[22]  Farrukh Aslam Khan,et al.  A hybrid technique using binary particle swarm optimization and decision tree pruning for network intrusion detection , 2018, Cluster Computing.

[23]  Ali A. Ghorbani,et al.  IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS 1 Toward Credible Evaluation of Anomaly-Based Intrusion-Detection Methods , 2022 .

[24]  Iqbal H. Sarker,et al.  BehavDT: A Behavioral Decision Tree Learning to Build User-Centric Context-Aware Predictive Model , 2019, Mobile Networks and Applications.

[25]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[26]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[27]  Steven Aftergood,et al.  Cybersecurity: The cold war online , 2017, Nature.

[28]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[29]  Rajesh Kumar,et al.  Effective and Explainable Detection of Android Malware Based on Machine Learning Algorithms , 2018, ICCAI 2018.

[30]  Giovanni Vigna,et al.  Using Hidden Markov Models to Evaluate the Risks of Intrusions , 2006, RAID.

[31]  Anamika Yadav,et al.  Decision Tree Based Intrusion Detection System for NSL-KDD Dataset , 2017 .

[32]  Dan Craigen,et al.  Defining Cybersecurity , 2014 .

[33]  Emmett Witchel,et al.  Ryoan: A Distributed Sandbox for Untrusted Computation on Secret Data , 2016, OSDI.

[34]  Haider Abbas,et al.  A framework for cloud forensics evidence collection and analysis using security information and event management , 2016, Secur. Commun. Networks.

[35]  Dewan Md Farid,et al.  Feature selection and intrusion classification in NSL-KDD cup 99 dataset employing SVMs , 2014, The 8th International Conference on Software, Knowledge, Information Management and Applications (SKIMA 2014).

[36]  Mattia Zago,et al.  UMUDGA: A dataset for profiling algorithmically generated domain names in botnet detection , 2020, Data in brief.

[37]  Sadok Ben Yahia,et al.  A Multi-agents Intrusion Detection System Using Ontology and Clustering Techniques , 2015, CIIA.

[38]  Udo W. Pooch,et al.  Adaptation techniques for intrusion detection and intrusion response systems , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[39]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[40]  Paul A. Watters,et al.  Zero-day Malware Detection based on Supervised Learning Algorithms of API call Signatures , 2011, AusDM.

[41]  R.K. Cunningham,et al.  Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[42]  Zheng Yan,et al.  A hybrid approach of mobile malware detection in Android , 2017, J. Parallel Distributed Comput..

[43]  Iqbal H. Sarker,et al.  ABC-RuleMiner: User behavioral rule-based machine learning method for context-aware intelligent services , 2020, J. Netw. Comput. Appl..

[44]  Iqbal Gondal,et al.  Survey of intrusion detection systems: techniques, datasets and challenges , 2019, Cybersecurity.

[45]  Ali A. Ghorbani,et al.  Toward developing a systematic approach to generate benchmark datasets for intrusion detection , 2012, Comput. Secur..

[46]  Iqbal H. Sarker,et al.  RecencyMiner: mining recency-based personalized behavior from contextual smartphone data , 2019, Journal of Big Data.

[47]  Yuefei Zhu,et al.  A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks , 2017, IEEE Access.

[48]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[49]  Shahram Sarkani,et al.  A network intrusion detection system based on a Hidden Naïve Bayes multiclass classifier , 2012, Expert Syst. Appl..

[50]  Nour Moustafa,et al.  UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set) , 2015, 2015 Military Communications and Information Systems Conference (MilCIS).

[51]  Richard Weber,et al.  Latent semantic analysis and keyword extraction for phishing classification , 2010, 2010 IEEE International Conference on Intelligence and Security Informatics.

[52]  Lida Xu,et al.  The internet of things: a survey , 2014, Information Systems Frontiers.

[53]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[54]  Stephan Grimm,et al.  Knowledge Representation and Ontologies , 2010, Scientific Data Mining and Knowledge Discovery.

[55]  Lotfi A. Zadeh,et al.  Fuzzy logic - a personal perspective , 2015, Fuzzy Sets Syst..

[56]  Peter G. Neumann,et al.  Toward a safer and more secure cyberspace , 2007, CACM.

[57]  Andrew H. Sung,et al.  Cyber Security Challenges: Designing Efficient Intrusion Detection Systems and Antivirus Tools , 2004 .

[58]  Yorick Wilks,et al.  Cyberattack Prediction Through Public Text Analysis and Mini-Theories , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[59]  Teresa Susana Mendes Pereira,et al.  An Ontology Based Approach to Information Security , 2009, MTSR.

[60]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[61]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[62]  Mourad Debbabi,et al.  MalDy: Portable, data-driven malware detection using natural language processing and machine learning techniques on behavioral analysis reports , 2018, Digit. Investig..

[63]  Chun-Hung Richard Lin,et al.  Intrusion detection system: A comprehensive review , 2013, J. Netw. Comput. Appl..

[64]  Daniel S. Berman,et al.  A Survey of Deep Learning Methods for Cyber Security , 2019, Inf..

[65]  Xiaojiang Du,et al.  A Survey of Machine and Deep Learning Methods for Internet of Things (IoT) Security , 2018, IEEE Communications Surveys & Tutorials.

[66]  Shawkat K. Guirguis,et al.  A Survey on Cryptography Algorithms , 2018 .

[67]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[68]  Yufeng Li,et al.  A Backdoor Attack Against LSTM-Based Text Classification Systems , 2019, IEEE Access.

[69]  Iqbal H. Sarker,et al.  ContextPCA: Predicting Context-Aware Smartphone Apps Usage Based On Machine Learning Techniques , 2020, Symmetry.

[70]  Sunghyuck Hong Survey on Analysis and Countermeasure for Hacking Attacks to Cryptocurrency Exchange , 2019 .

[71]  Jong Hyuk Park,et al.  DTB-IDS: an intrusion detection system based on decision tree using behavior analysis for preventing APT attacks , 2015, The Journal of Supercomputing.

[72]  Ben Y. Zhao,et al.  Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[73]  Mohammad Zulkernine,et al.  Random-Forests-Based Network Intrusion Detection Systems , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[74]  Jun Sun,et al.  Auditing Anti-Malware Tools by Evolving Android Malware and Dynamic Loading Technique , 2017, IEEE Transactions on Information Forensics and Security.

[75]  Jemal H. Abawajy,et al.  Using feature selection for intrusion detection system , 2012, 2012 International Symposium on Communications and Information Technologies (ISCIT).

[76]  Roberto Tronci,et al.  HMMPayl: An intrusion detection system based on Hidden Markov Models , 2011, Comput. Secur..

[77]  Abraham Shaw,et al.  Data Breach: From Notification to Prevention Using PCI DSS , 2010 .

[78]  Ryoan , 2018, ACM Transactions on Computer Systems.

[79]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[80]  Paul Rimba,et al.  Data-Driven Cybersecurity Incident Prediction: A Survey , 2019, IEEE Communications Surveys & Tutorials.

[81]  Farrukh Aslam Khan,et al.  TSDL: A Two-Stage Deep Learning Model for Efficient Network Intrusion Detection , 2019, IEEE Access.

[82]  Dennis Kügler,et al.  "Man in the Middle" Attacks on Bluetooth , 2003, Financial Cryptography.

[83]  Wenjuan Li,et al.  Design of intelligent KNN-based alarm filter using knowledge-based alert verification in intrusion detection , 2015, Secur. Commun. Networks.

[84]  Iqbal H. Sarker A Machine Learning based Robust Prediction Model for Real-life Mobile Phone Data , 2019, Internet Things.

[85]  Wee Keong Ng,et al.  Rapid association rule mining , 2001, CIKM '01.

[86]  K. Raghuveer,et al.  Confederation of FCM clustering, ANN and SVM techniques to implement hybrid NIDS using corrected KDD cup 99 dataset , 2014, 2014 International Conference on Communication and Signal Processing.

[87]  Madalina Zurini,et al.  Named-Entity-Recognition-Based Automated System for Diagnosing Cybersecurity Situations in IoT Networks , 2019, Sensors.

[88]  Xiaoqiang Di,et al.  Formal definition and analysis of access control model based on role and attribute , 2018, J. Inf. Secur. Appl..

[89]  Howon Kim,et al.  Long Short Term Memory Recurrent Neural Network Classifier for Intrusion Detection , 2016, 2016 International Conference on Platform Technology and Service (PlatCon).

[90]  Rakhi D. Wajgi,et al.  Classification of Attacks Using Support Vector Machine (SVM) on KDDCUP'99 IDS Database , 2015, 2015 Fifth International Conference on Communication Systems and Network Technologies.

[91]  Nauman Aslam,et al.  An efficient reinforcement learning-based Botnet detection approach , 2020, J. Netw. Comput. Appl..

[92]  Abdolreza Mirzaei,et al.  Intrusion detection using fuzzy association rules , 2009, Appl. Soft Comput..

[93]  Eric A. Fischer Cybersecurity Issues and Challenges: In Brief , 2014 .

[94]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[95]  Angelos D. Keromytis,et al.  SQLrand: Preventing SQL Injection Attacks , 2004, ACNS.

[96]  Iqbal H. Sarker Context-aware rule learning from smartphone data: survey, challenges and future directions , 2019, Journal of Big Data.

[97]  A. A. Zaidan,et al.  Review of intrusion detection systems based on deep learning techniques: coherent taxonomy, challenges, motivations, recommendations, substantial analysis and future directions , 2019, Neural Computing and Applications.

[98]  Dharmaraj R. Patil,et al.  Implementation of network intrusion detection system using variant of decision tree algorithm , 2015, 2015 International Conference on Nascent Technologies in the Engineering Field (ICNTE).

[99]  Ing-Ray Chen,et al.  Behavior Rule Specification-Based Intrusion Detection for Safety Critical Medical Cyber Physical Systems , 2015, IEEE Transactions on Dependable and Secure Computing.

[100]  Yinhui Li,et al.  An efficient intrusion detection system based on support vector machines and gradually feature removal method , 2012, Expert Syst. Appl..

[101]  Iqbal H. Sarker,et al.  Mobile Data Science and Intelligent Apps: Concepts, AI-Based Modeling and Research Directions , 2020, Mobile Networks and Applications.

[102]  E. Dada A Hybridized SVM-kNN-pdAPSO Approach to Intrusion Detection System , 2017 .

[103]  Sudhakar,et al.  An emerging threat Fileless malware: a survey and research challenges , 2020, Cybersecur..

[104]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[105]  James V. Hansen,et al.  Genetic programming for prevention of cyberterrorism through dynamic and evolving intrusion detection , 2007, Decis. Support Syst..

[106]  Vivek Kumar Sharma,et al.  An Intrusion Detection System using KNN-ACO Algorithm , 2017 .

[107]  Elena Sitnikova,et al.  Towards the Development of Realistic Botnet Dataset in the Internet of Things for Network Forensic Analytics: Bot-IoT Dataset , 2018, Future Gener. Comput. Syst..

[108]  Xing Fang,et al.  Toward multi-label sentiment analysis: a transfer learning based approach , 2020, Journal of Big Data.

[109]  Claudia Eckert,et al.  Deep Learning for Classification of Malware System Call Sequences , 2016, Australasian Conference on Artificial Intelligence.

[110]  Abdul Razaque,et al.  Deep recurrent neural network for IoT intrusion detection system , 2020, Simul. Model. Pract. Theory.

[111]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[112]  Nivethitha Somu,et al.  An efficient intrusion detection technique based on support vector machine and improved binary gravitational search algorithm , 2019, Artificial Intelligence Review.

[113]  Bali Devi,et al.  Mobile Big Data: Malware and Its Analysis , 2018 .

[114]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[115]  Xinghuo Yu,et al.  Evaluating Host-Based Anomaly Detection Systems: Application of the Frequency-Based Algorithms to ADFA-LD , 2014, NSS.

[116]  Leyla Bilge,et al.  Before we knew it: an empirical study of zero-day attacks in the real world , 2012, CCS.

[117]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[118]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[119]  Witold Pedrycz,et al.  Network traffic fusion and analysis against DDoS flooding attacks with a novel reversible sketch , 2019, Inf. Fusion.

[120]  Jens Myrup Pedersen,et al.  Detection of Malicious domains through lexical analysis , 2018, 2018 International Conference on Cyber Security and Protection of Digital Services (Cyber Security).

[121]  Bruce D. Caulkins,et al.  Review and insight on the behavioral aspects of cybersecurity , 2020, Cybersecurity.

[122]  Joshua Glasser,et al.  Bridging the Gap: A Pragmatic Approach to Generating Insider Threat Data , 2013, 2013 IEEE Security and Privacy Workshops.

[123]  Shiliang Sun,et al.  A review of natural language processing techniques for opinion mining systems , 2017, Inf. Fusion.

[124]  Hongyu Liu,et al.  CNN and RNN based payload classification methods for attack detection , 2019, Knowl. Based Syst..

[125]  Mourad Debbabi,et al.  The Use of NLP Techniques in Static Code Analysis to Detect Weaknesses and Vulnerabilities , 2014, Canadian Conference on AI.

[126]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[127]  Vijay Kumar Jha,et al.  Genetic Algorithm to Solve the Problem of Small Disjunct In the Decision Tree Based Intrusion Detection System , 2015 .

[128]  Dimitris Gritzalis,et al.  The Big Four - What We Did Wrong in Advanced Persistent Threat Detection? , 2013, 2013 International Conference on Availability, Reliability and Security.

[129]  Merrill Warkentin,et al.  Behavioral and policy issues in information systems security: the insider threat , 2009, Eur. J. Inf. Syst..

[130]  Shannon Eggers A novel approach for analyzing the nuclear supply chain cyber-attack surface , 2020 .

[131]  Karl Sigler Crypto-jacking: how cyber-criminals are exploiting the crypto-currency boom , 2018 .

[132]  Leighton R. Johnson Computer Incident Response and Forensics Team Management: Conducting a Successful Incident Response , 2013 .

[133]  P. Sneath,et al.  Some thoughts on bacterial classification. , 1957, Journal of general microbiology.

[134]  Manuel López Martín,et al.  Application of deep reinforcement learning to intrusion detection for supervised problems , 2020, Expert Syst. Appl..

[135]  Lin Yang,et al.  A Survey on the Development of Self-Organizing Maps for Unsupervised Intrusion Detection , 2021, Mob. Networks Appl..

[136]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[137]  Julian Jang,et al.  A survey of emerging threats in cybersecurity , 2014, J. Comput. Syst. Sci..

[138]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[139]  Ming Zhu,et al.  Malware traffic classification using convolutional neural network for representation learning , 2017, 2017 International Conference on Information Networking (ICOIN).

[140]  Iqbal H. Sarker,et al.  Individualized Time-Series Segmentation for Mining Mobile Phone User Behavior , 2018, Comput. J..

[141]  Iqbal H. Sarker,et al.  IntruDTree: A Machine Learning Based Cyber Security Intrusion Detection Model , 2020, Symmetry.

[142]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[143]  Jiankun Hu,et al.  Evaluating host-based anomaly detection systems: Application of the one-class SVM algorithm to ADFA-LD , 2014, 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[144]  S. Thamarai Selvi,et al.  DDoS detection and analysis in SDN-based environment using support vector machine classifier , 2014, 2014 Sixth International Conference on Advanced Computing (ICoAC).

[145]  Dejan Simic,et al.  Common web application attack types and security using ASP.NET , 2006, Comput. Sci. Inf. Syst..

[146]  Iqbal H. Sarker,et al.  Cybersecurity data science: an overview from machine learning perspective , 2020, Journal of Big Data.

[147]  Chih-Fong Tsai,et al.  CANN: An intrusion detection system based on combining cluster centers and nearest neighbors , 2015, Knowl. Based Syst..