Performance Comparison and Current Challenges of Using Machine Learning Techniques in Cybersecurity

Cyberspace has become an indispensable factor for all areas of the modern world. The world is becoming more and more dependent on the internet for everyday living. The increasing dependency on the internet has also widened the risks of malicious threats. On account of growing cybersecurity risks, cybersecurity has become the most pivotal element in the cyber world to battle against all cyber threats, attacks, and frauds. The expanding cyberspace is highly exposed to the intensifying possibility of being attacked by interminable cyber threats. The objective of this survey is to bestow a brief review of different machine learning (ML) techniques to get to the bottom of all the developments made in detection methods for potential cybersecurity risks. These cybersecurity risk detection methods mainly comprise of fraud detection, intrusion detection, spam detection, and malware detection. In this review paper, we build upon the existing literature of applications of ML models in cybersecurity and provide a comprehensive review of ML techniques in cybersecurity. To the best of our knowledge, we have made the first attempt to give a comparison of the time complexity of commonly used ML models in cybersecurity. We have comprehensively compared each classifier’s performance based on frequently used datasets and sub-domains of cyber threats. This work also provides a brief introduction of machine learning models besides commonly used security datasets. Despite having all the primary precedence, cybersecurity has its constraints compromises, and challenges. This work also expounds on the enormous current challenges and limitations faced during the application of machine learning techniques in cybersecurity.

[1]  Zhenlong Yuan,et al.  DroidDetector: Android Malware Characterization and Detection Using Deep Learning , 2016 .

[2]  Aziz Mohaisen,et al.  Unveiling Zeus: automated classification of malware samples , 2013, WWW.

[3]  Youssef B. Mahdy,et al.  Behavior-based features model for malware detection , 2016, Journal of Computer Virology and Hacking Techniques.

[4]  Nadine Hajj,et al.  Deep belief networks and cortical algorithms: A comparative study for supervised classification , 2019, Applied Computing and Informatics.

[5]  Joachim Fabini,et al.  Malware propagation in smart grid networks: metrics, simulation and comparison of three malware types , 2018, Journal of Computer Virology and Hacking Techniques.

[6]  J.B. Grizzard,et al.  An investigation of a compromised host on a honeynet being used to increase the security of a large enterprise network , 2004, Proceedings from the Fifth Annual IEEE SMC Information Assurance Workshop, 2004..

[7]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[8]  Munam Ali Shah,et al.  Analysis of machine learning solutions to detect malware in android , 2016, 2016 Sixth International Conference on Innovative Computing Technology (INTECH).

[9]  Tao Li,et al.  An intelligent PE-malware detection system based on association mining , 2008, Journal in Computer Virology.

[10]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[11]  Ashkan Sami,et al.  Using feature generation from API calls for malware detection , 2014 .

[12]  P. J. García-Nieto,et al.  Review: machine learning techniques applied to cybersecurity , 2019, International Journal of Machine Learning and Cybernetics.

[13]  Banshidhar Majhi,et al.  Progress in Intelligent Computing Techniques: Theory, Practice, and Applications , 2018 .

[14]  Dewan Md Farid,et al.  Feature selection and intrusion classification in NSL-KDD cup 99 dataset employing SVMs , 2014, The 8th International Conference on Software, Knowledge, Information Management and Applications (SKIMA 2014).

[15]  Sulaiman Mohd Nor,et al.  FEATURE SELECTION AND MACHINE LEARNING CLASSIFICATION FOR MALWARE DETECTION , 2015 .

[16]  Qiang Ye,et al.  A machine learning based intrusion detection scheme for data fusion in mobile clouds involving heterogeneous client networks , 2019, Inf. Fusion.

[17]  Jasmin Kevric,et al.  An effective combining classifier approach using tree algorithms for network intrusion detection , 2017, Neural Computing and Applications.

[18]  Hemanta Kumar Kalita,et al.  Analysis of Machine Learning Techniques Based Intrusion Detection Systems , 2016 .

[19]  Manas Ranjan Patra,et al.  NETWORK INTRUSION DETECTION USING NAÏVE BAYES , 2007 .

[20]  S. M. Elseuofi,et al.  MACHINE LEARNING METHODS FOR SPAM E-MAIL CLASSIFICATION , 2011 .

[21]  Alwyn Roshan Pais,et al.  Detection of phishing websites using an efficient feature-based machine learning framework , 2018, Neural Computing and Applications.

[22]  Parikshit N. Mahalle,et al.  A Comparative Analysis and Discussion of Email Spam Classification Methods Using Machine Learning Techniques , 2019 .

[23]  Chun-I Fan,et al.  Malware Detection Systems Based on API Log Data Mining , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[24]  A. Nur Zincir-Heywood,et al.  User identification via neural network based language models , 2019, Int. J. Netw. Manag..

[25]  Ahmed Ahmim,et al.  A Novel Hierarchical Intrusion Detection System Based on Decision Tree and Rules-Based Models , 2018, 2019 15th International Conference on Distributed Computing in Sensor Systems (DCOSS).

[26]  Yuval Elovici,et al.  Detecting unknown malicious code by applying classification techniques on OpCode patterns , 2012, Security Informatics.

[27]  Haruna Chiroma,et al.  Machine learning for email spam filtering: review, approaches and open research problems , 2019, Heliyon.

[28]  Roberto Baldoni,et al.  Survey on the Usage of Machine Learning Techniques for Malware Analysis , 2017, Comput. Secur..

[29]  K. P. Soman,et al.  Deep Learning Approach for Intelligent Intrusion Detection System , 2019, IEEE Access.

[30]  Bhavani M. Thuraisingham,et al.  A new intrusion detection system using support vector machines and hierarchical clustering , 2007, The VLDB Journal.

[31]  Dharmaraj R. Patil,et al.  Implementation of network intrusion detection system using variant of decision tree algorithm , 2015, 2015 International Conference on Nascent Technologies in the Engineering Field (ICNTE).

[32]  Ying Gao,et al.  A Distributed Network Intrusion Detection System for Distributed Denial of Service Attacks in Vehicular Ad Hoc Network , 2019, IEEE Access.

[33]  Ana Isabel Canhoto,et al.  Artificial intelligence and machine learning as business tools: A framework for diagnosing value destruction potential , 2020 .

[34]  Ammar Almomani,et al.  Machine Learning for Phishing Detection and Mitigation , 2019 .

[35]  P. V. Shijo,et al.  Integrated Static and Dynamic Analysis for Malware Detection , 2015 .

[36]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[37]  Shi-Jinn Horng,et al.  A novel intrusion detection system based on hierarchical clustering and support vector machines , 2011, Expert Syst. Appl..

[38]  Haytham Elmiligi,et al.  The Curious Case of Machine Learning In Malware Detection , 2019, ICISSP.

[39]  Aristidis Likas,et al.  Deep Belief Networks for Spam Filtering , 2007 .

[40]  Kamran Shaukat,et al.  Student’s Performance: A Data Mining Perspective , 2017 .

[41]  Kamran Shaukat,et al.  Student's performance in the context of data mining , 2016, 2016 19th International Multi-Topic Conference (INMIC).

[42]  Jian-hua Li,et al.  Cyber security meets artificial intelligence: a survey , 2018, Frontiers of Information Technology & Electronic Engineering.

[43]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[44]  Rui Li,et al.  A Behavior-Based Approach for Malware Detection , 2017, IFIP Int. Conf. Digital Forensics.

[45]  Anamika Yadav,et al.  Decision Tree Based Intrusion Detection System for NSL-KDD Dataset , 2017 .

[46]  K. P. Soman,et al.  DeepImageSpam: Deep Learning based Image Spam Detection , 2018, ArXiv.

[47]  Azween Abdullah,et al.  Artificial neural network approaches to intrusion detection: a review , 2009, IEEE ICT 2009.

[48]  Haengnam Sung,et al.  A Comparative Study on the Performance of Intrusion Detection using Decision Tree and Artificial Neural Network Models , 2015 .

[49]  Shaoning Pang,et al.  Multiple sequence alignment and artificial neural networks for malicious software detection , 2012, 2012 8th International Conference on Natural Computation.

[50]  Thomas H. Morris,et al.  Machine Learning and Cyber Security , 2017, 2017 International Conference on Computer, Electrical & Communication Engineering (ICCECE).

[51]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[52]  Ohm Sornil,et al.  Classification of malware families based on N-grams sequential pattern features , 2013, 2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA).

[53]  Eric A. Fischer Creating a National Framework for Cybersecurity: An analysis of Issues and Options , 2005 .

[54]  Liangxiao Jiang,et al.  A Novel Bayes Model: Hidden Naive Bayes , 2009, IEEE Transactions on Knowledge and Data Engineering.

[55]  Ali A. Ghorbani,et al.  Comparative Study of Supervised Machine Learning Techniques for Intrusion Detection , 2007, Fifth Annual Conference on Communication Networks and Services Research (CNSR '07).

[56]  Jiankun Hu,et al.  Evaluating host-based anomaly detection systems: Application of the one-class SVM algorithm to ADFA-LD , 2014, 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[57]  Ali A. Ghorbani,et al.  DroidKin: Lightweight Detection of Android Apps Similarity , 2014, SecureComm.

[58]  Michele Colajanni,et al.  On the effectiveness of machine and deep learning for cyber security , 2018, 2018 10th International Conference on Cyber Conflict (CyCon).

[59]  Swapan Purkait,et al.  Information Management & Computer Security Phishing counter measures and their effectiveness – literature review , 2016 .

[60]  Guang Cheng,et al.  An Efficient Network Intrusion Detection System Based on Feature Selection and Ensemble Classifier , 2019, ArXiv.

[61]  Akshita Tyagi Content Based Spam Classification- A Deep Learning Approach , 2016 .

[62]  Adamu I. Abubakar,et al.  A Review on Mobile SMS Spam Filtering Techniques , 2017, IEEE Access.

[63]  Anazida Zainal,et al.  Spam detection using hybrid Artificial Neural Network and Genetic algorithm , 2013, 2013 13th International Conference on Intellient Systems Design and Applications.

[64]  M. Bassiouni,et al.  Ham and Spam E-Mails Classification Using Machine Learning Techniques , 2018 .

[65]  Jun Zhang,et al.  A Performance Evaluation of Machine Learning-Based Streaming Spam Tweets Detection , 2015, IEEE Transactions on Computational Social Systems.

[66]  Sachin Ahuja,et al.  Machine learning and its applications: A review , 2017, 2017 International Conference on Big Data Analytics and Computational Intelligence (ICBDAC).

[67]  Kalamullah Ramli,et al.  Study on implementation of machine learning methods combination for improving attacks detection accuracy on Intrusion Detection System (IDS) , 2015, 2015 International Conference on Quality in Research (QiR).

[68]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[69]  Yudong Zhang,et al.  Binary PSO with mutation operator for feature selection using decision tree applied to spam detection , 2014, Knowl. Based Syst..

[70]  P. Shanthi Bala,et al.  Cyber Threats Detection and Mitigation Using Machine Learning , 2020 .

[71]  Alva Erwin,et al.  Analysis of Machine learning Techniques Used in Behavior-Based Malware Detection , 2010, 2010 Second International Conference on Advances in Computing, Control, and Telecommunication Technologies.

[72]  Wei Huang,et al.  A Shellcode Detection Method Based on Full Native API Sequence and Support Vector Machine , 2017 .

[73]  Nayyer Masood,et al.  Dengue Fever Prediction: A Data Mining Problem , 2015 .

[74]  이상헌,et al.  Deep Belief Networks , 2010, Encyclopedia of Machine Learning.

[75]  Sebastian Zander,et al.  Intrusion Detection System using Ripple Down Rule learner and Genetic Algorithm , 2014 .

[76]  Vineet Richariya,et al.  Intrusion Detection in KDD99 Dataset using SVM-PSO and Feature Reduction with Information Gain , 2014 .

[77]  V. Tiwari,et al.  Enhanced Method for Intrusion Detection over KDD Cup 99 Dataset , 2016 .

[78]  Vacius Jusas,et al.  Logical filter approach for early stage cyber-attack detection , 2019, Comput. Sci. Inf. Syst..

[79]  Issa Joseph Alkaht,et al.  Filtering SPAM Using Several Stages Neural Networks , 2016 .

[80]  Farrukh Aslam Khan,et al.  A hybrid technique using binary particle swarm optimization and decision tree pruning for network intrusion detection , 2018, Cluster Computing.

[81]  Shamkant B. Navathe,et al.  Managing vulnerabilities of information systems to security incidents , 2003, ICEC '03.

[82]  Kamran Shaukat,et al.  A Socio-Technological analysis of Cyber Crime and Cyber Security in Pakistan , 2017 .

[83]  Christian Viard-Gaudin,et al.  A Convolutional Neural Network Approach for Objective Video Quality Assessment , 2006, IEEE Transactions on Neural Networks.

[84]  Abbas Javed,et al.  RNN-ABC: A New Swarm Optimization Based Technique for Anomaly Detection , 2019, Comput..

[85]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[86]  Ali A. Ghorbani,et al.  Toward developing a systematic approach to generate benchmark datasets for intrusion detection , 2012, Comput. Secur..

[87]  Yehuda Afek,et al.  Zero-Day Signature Extraction for High-Volume Attacks , 2019, IEEE/ACM Transactions on Networking.

[88]  Anuradha Pillai,et al.  Applications of Machine Learning in Cyber Security , 2020, Handbook of Research on Machine and Deep Learning Applications for Cyber Security.

[89]  Mohammed Awad,et al.  EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PARTICLE SWARM OPTIMIZATION , 2016 .

[90]  S. K. Sharma,et al.  An improved network intrusion detection technique based on k-means clustering via Naïve bayes classification , 2012, IEEE-International Conference On Advances In Engineering, Science And Management (ICAESM -2012).

[91]  Bhavna Dharamkar,et al.  A Review of Cyber Attack Classification Technique Based on Data Mining and Neural Network Approach , 2014 .

[92]  Sailesh Suryanarayan Iyer,et al.  Applications of Machine Learning in Cyber Security Domain , 2020 .

[93]  M. Soranamageswari,et al.  A Novel Approach towards Image Spam Classification , 2011 .

[94]  Sumant Sharma,et al.  Adaptive Approach for Spam Detection , 2013 .

[95]  Adel Ammar A Decision Tree Classifier for Intrusion Detection Priority Tagging , 2015 .

[96]  Khan Muhammad,et al.  A local and global event sentiment based efficient stock exchange forecasting using deep learning , 2020, Int. J. Inf. Manag..

[97]  Dan Craigen,et al.  Defining Cybersecurity , 2014 .

[98]  Mohamed Amine Ferrag,et al.  Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study , 2020, J. Inf. Secur. Appl..

[99]  Anil K. Jain,et al.  Artificial Neural Networks: A Tutorial , 1996, Computer.

[100]  Christian Igel,et al.  An Introduction to Restricted Boltzmann Machines , 2012, CIARP.

[101]  Bruce Ndibanje,et al.  Cross-Method-Based Analysis and Classification of Malicious Behavior by API Calls Extraction , 2019, Applied Sciences.

[102]  Roberto Baldoni,et al.  Survey on the Usage of Machine Learning Techniques for Malware Analysis , 2017, ArXiv.

[103]  Usman Qamar,et al.  Text Mining Approach to Detect Spam in Emails , 2016 .

[104]  Nasir Fareed Shah,et al.  A Comparative Analysis of Various Spam Classifications , 2018 .

[105]  Kijun Han,et al.  Cyber Threat Detection Based on Artificial Neural Networks Using Event Profiles , 2019, IEEE Access.

[106]  Vacius Jusas,et al.  Classification of Motor Imagery Using Combination of Feature Extraction and Reduction Methods for Brain-Computer Interface , 2019, Inf. Technol. Control..

[107]  Xiao Chun Yin,et al.  Toward an Applied Cyber Security Solution in IoT-Based Smart Grids: An Intrusion Detection System Approach , 2019, Sensors.

[108]  Vijay Varadharajan,et al.  A Detailed Investigation and Analysis of Using Machine Learning Techniques for Intrusion Detection , 2019, IEEE Communications Surveys & Tutorials.

[109]  Carlos Borrego,et al.  Applications in Security and Evasions in Machine Learning: A Survey , 2020 .

[110]  Robert C. Atkinson,et al.  Shallow and Deep Networks Intrusion Detection System: A Taxonomy and Survey , 2017, ArXiv.

[111]  K. Renuka,et al.  A Hybrid ACO Based Feature Selection Method for Email Spam Classification , 2015 .

[112]  Md. Rafiqul Islam,et al.  Classification of malware based on integrated static and dynamic features , 2013, J. Netw. Comput. Appl..

[113]  Md Zahangir Alom,et al.  Intrusion detection using deep belief networks , 2015, 2015 National Aerospace and Electronics Conference (NAECON).

[114]  Ali A. Ghorbani,et al.  Detecting Malicious URLs Using Lexical Analysis , 2016, NSS.

[115]  Parvez Ahammad,et al.  SoK: Applying Machine Learning in Security - A Survey , 2016, ArXiv.

[116]  Ying Zhang,et al.  Intrusion Detection for IoT Based on Improved Genetic Algorithm and Deep Belief Network , 2019, IEEE Access.

[117]  Kien A. Hua,et al.  Decision tree classifier for network intrusion detection with GA-based feature selection , 2005, ACM Southeast Regional Conference.

[118]  Jong Hyuk Park,et al.  DTB-IDS: an intrusion detection system based on decision tree using behavior analysis for preventing APT attacks , 2015, The Journal of Supercomputing.

[119]  Sabyasachi Patra,et al.  Machine Learning Approach for Intrusion Detection on Cloud Virtual Machines , 2013 .

[120]  Kathleen Goeschel,et al.  Reducing false positives in intrusion detection systems using data-mining techniques utilizing support vector machines, decision trees, and naive Bayes for off-line analysis , 2016, SoutheastCon 2016.

[121]  D. Karthika Renuka,et al.  Improving E-Mail Spam Classification using Ant Colony Optimization Algorithm , 2015 .

[122]  Alberto Perez Veiga Applications of Artificial Intelligence to Network Security , 2018, ArXiv.

[123]  Jianfeng Ma,et al.  A Novel Dynamic Android Malware Detection System With Ensemble Learning , 2018, IEEE Access.

[124]  Dushyant Kumar Singh,et al.  Review of Machine Learning Methods for Windows Malware Detection , 2019, 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT).

[125]  Gonzalo Álvarez,et al.  An Anomaly-based Web Application Firewall , 2009, SECRYPT.

[126]  M. Siddiqui,et al.  Detecting Internet Worms Using Data Mining Techniques , 2008 .

[127]  Ting Liu,et al.  Recent advances in convolutional neural networks , 2015, Pattern Recognit..

[128]  Andrew B. Whinston,et al.  How Would Information Disclosure Influence Organizations' Outbound Spam Volume? Evidence from a Field Experiment , 2016, J. Cybersecur..

[129]  Manisha Sharma,et al.  Spam Detection on Social Media Using Semantic Convolutional Neural Network , 2018, Int. J. Knowl. Discov. Bioinform..

[130]  Lalu Banoth,et al.  A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection , 2017 .

[131]  Hui-Juan Zhu,et al.  HEMD: a highly efficient random forest-based malware detection framework for Android , 2017, Neural Computing and Applications.

[132]  Chunhua Wang,et al.  Machine Learning and Deep Learning Methods for Cybersecurity , 2018, IEEE Access.

[133]  Nithin Kashyap,et al.  Providing Cyber Security using Artificial Intelligence – A survey , 2019, 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC).

[134]  B. Geluvaraj,et al.  The Future of Cybersecurity: Major Role of Artificial Intelligence, Machine Learning, and Deep Learning in Cyberspace , 2018, International Conference on Computer Networks and Communication Technologies.

[135]  P. Priyatharsini,et al.  CLASSIFICATION TECHNIQUES USING SPAM FILTERING EMAIL , 2018 .

[136]  Megha Rathi,et al.  Spam Mail Detection through Data Mining – A Comparative Performance Analysis , 2013 .

[137]  Govind P. Gupta,et al.  A Framework for Fast and Efficient Cyber Security Network Intrusion Detection Using Apache Spark , 2016 .

[138]  Fabio Roli,et al.  A survey and experimental evaluation of image spam filtering techniques , 2011, Pattern Recognit. Lett..

[139]  Daniel S. Berman,et al.  A Survey of Deep Learning Methods for Cyber Security , 2019, Inf..

[140]  M. Chuah,et al.  Spam Detection on Twitter Using Traditional Classifiers , 2011, ATC.

[141]  Jean-Luc Gauvain,et al.  Optimization of RNN-Based Speech Activity Detection , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[142]  S. Thamarai Selvi,et al.  DDoS detection and analysis in SDN-based environment using support vector machine classifier , 2014, 2014 Sixth International Conference on Advanced Computing (ICoAC).

[143]  Mansour Sheikhan,et al.  Intrusion detection using reduced-size RNN based on feature grouping , 2010, Neural Computing and Applications.

[144]  Weiqing Sun,et al.  Efficient spam detection across Online Social Networks , 2016, 2016 IEEE International Conference on Big Data Analysis (ICBDA).

[145]  Amit Kumar Dewangan,et al.  An Ensemble Model for Classification of Attacks with Feature Selection based on KDD99 and NSL-KDD Data Set , 2014 .

[146]  Jinoh Kim,et al.  A survey of deep learning-based network anomaly detection , 2017, Cluster Computing.

[147]  Yanfang Ye,et al.  DL 4 MD : A Deep Learning Framework for Intelligent Malware Detection , 2016 .

[148]  Dong Seong Kim,et al.  Spam Detection Using Feature Selection and Parameters Optimization , 2010, 2010 International Conference on Complex, Intelligent and Software Intensive Systems.

[149]  Yuancheng Li,et al.  A Hybrid Malicious Code Detection Method based on Deep Learning , 2015 .

[150]  Vacius Jusas,et al.  Comparison of Feature Extraction Methods for EEG BCI Classification , 2015, ICIST.

[151]  R.K. Cunningham,et al.  Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[152]  Ping Yan,et al.  A survey on dynamic mobile malware detection , 2017, Software Quality Journal.

[153]  Divya Bansal,et al.  Malware Analysis and Classification: A Survey , 2014 .

[154]  Mariette Awad,et al.  Ham or spam? A comparative study for some content-based classification algorithms for email filtering , 2014, MELECON 2014 - 2014 17th IEEE Mediterranean Electrotechnical Conference.

[155]  Igor Santos,et al.  Opcode sequences as representation of executables for data-mining-based unknown malware detection , 2013, Inf. Sci..

[156]  Peter Szor,et al.  The Art of Computer Virus Research and Defense , 2005 .