Real-time big data processing for anomaly detection: A Survey

Abstract The advent of connected devices and omnipresence of Internet have paved way for intruders to attack networks, which leads to cyber-attack, financial loss, information theft in healthcare, and cyber war. Hence, network security analytics has become an important area of concern and has gained intensive attention among researchers, off late, specifically in the domain of anomaly detection in network, which is considered crucial for network security. However, preliminary investigations have revealed that the existing approaches to detect anomalies in network are not effective enough, particularly to detect them in real time. The reason for the inefficacy of current approaches is mainly due the amassment of massive volumes of data though the connected devices. Therefore, it is crucial to propose a framework that effectively handles real time big data processing and detect anomalies in networks. In this regard, this paper attempts to address the issue of detecting anomalies in real time. Respectively, this paper has surveyed the state-of-the-art real-time big data processing technologies related to anomaly detection and the vital characteristics of associated machine learning algorithms. This paper begins with the explanation of essential contexts and taxonomy of real-time big data processing, anomalous detection, and machine learning algorithms, followed by the review of big data processing technologies. Finally, the identified research challenges of real-time big data processing in anomaly detection are discussed.

[1]  Xiufeng Liu,et al.  Regression-based Online Anomaly Detection for Smart Grid Data , 2016, ArXiv.

[2]  Chen Jing,et al.  Fault detection based on a robust one class support vector machine , 2014, Neurocomputing.

[3]  Prem Prakash Jayaraman,et al.  Big Data Reduction Methods: A Survey , 2016, Data Science and Engineering.

[4]  Sherali Zeadally,et al.  Handling big data: research challenges and future directions , 2016, The Journal of Supercomputing.

[5]  Jeffrey Soar,et al.  Cloud computing-enabled healthcare opportunities, issues, and applications: A systematic review , 2018, Int. J. Inf. Manag..

[6]  Christopher Leckie,et al.  High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning , 2016, Pattern Recognit..

[7]  Yugyung Lee,et al.  Real-time network anomaly detection system using machine learning , 2015, 2015 11th International Conference on the Design of Reliable Communication Networks (DRCN).

[8]  Athanasios V. Vasilakos,et al.  Machine learning on big data: Opportunities and challenges , 2017, Neurocomputing.

[9]  Jayant Kalagnanam,et al.  Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[10]  Mohsen Guizani,et al.  The rise of ransomware and emerging security challenges in the Internet of Things , 2017, Comput. Networks.

[11]  Subutai Ahmad,et al.  Evaluating Real-Time Anomaly Detection Algorithms -- The Numenta Anomaly Benchmark , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[12]  Mourad Khayati,et al.  2015 Ieee International Conference on Big Data (big Data) Online Anomaly Detection over Big Data Streams , 2022 .

[13]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[14]  Rajkumar Buyya,et al.  Big Data computing and clouds: Trends and future directions , 2013, J. Parallel Distributed Comput..

[15]  Kevin B. Korb,et al.  Anomaly detection in vessel tracks using Bayesian networks , 2014, Int. J. Approx. Reason..

[16]  Rahat Iqbal,et al.  Big data analytics: Computational intelligence techniques and application areas , 2020, Technological Forecasting and Social Change.

[17]  Sachin Shetty,et al.  SCREDENT: Scalable Real-time Anomalies Detection and Notification of Targeted Malware in Mobile Devices , 2016, ANT/SEIT.

[18]  Marco Mellia,et al.  Big-DAMA: Big Data Analytics for Network Traffic Monitoring and Analysis , 2016, LANCOMM@SIGCOMM.

[19]  Youngseok Lee,et al.  Detecting DDoS attacks with Hadoop , 2011, CoNEXT '11 Student.

[20]  Tariq Mahmood,et al.  Security Analytics: Big Data Analytics for cybersecurity: A review of trends, techniques and tools , 2013, 2013 2nd National Conference on Information Assurance (NCIA).

[21]  Philippe Owezarski,et al.  Unsupervised Network Anomaly Detection in Real-Time on Big Data , 2015, ADBIS.

[22]  Xiangji Huang,et al.  Mining network data for intrusion detection through combining SVMs with ant colony networks , 2014, Future Gener. Comput. Syst..

[23]  Lidong Wang,et al.  Big Data Analytics for Network Intrusion Detection: A Survey , 2017 .

[24]  Cesario Di Sarno,et al.  A framework for Internet data real-time processing: A machine-learning approach , 2014, 2014 International Carnahan Conference on Security Technology (ICCST).

[25]  Cong Zhang,et al.  An inferential real-time falling posture reconstruction for Internet of healthcare things , 2017, J. Netw. Comput. Appl..

[26]  Leandros A. Maglaras,et al.  Intrusion detection in SCADA systems using machine learning techniques , 2014, 2014 Science and Information Conference.

[27]  Chunyong Yin,et al.  Mobile Anomaly Detection Based on Improved Self-Organizing Maps , 2017, Mob. Inf. Syst..

[28]  Byron Ellis Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data , 2014 .

[29]  Hirozumi Yamaguchi,et al.  Survey of Real-time Processing Technologies of IoT Data Streams , 2016, J. Inf. Process..

[30]  Taufik Abrão,et al.  Network Anomaly Detection System using Genetic Algorithm and Fuzzy Logic , 2018, Expert Syst. Appl..

[31]  Amutha Prabakar Muniyandi,et al.  Network Anomaly Detection by Cascading K-Means Clustering and C4.5 Decision Tree algorithm , 2012 .

[32]  Yang Zhao,et al.  What factors influence the mobile health service adoption? A meta-analysis and the moderating role of age , 2017, Int. J. Inf. Manag..

[33]  Joel J. P. C. Rodrigues,et al.  Network anomaly detection using IP flows with Principal Component Analysis and Ant Colony Optimization , 2016, J. Netw. Comput. Appl..

[34]  Alejandro Baldominos Gómez,et al.  A scalable machine learning online service for big data real-time analysis , 2014, 2014 IEEE Symposium on Computational Intelligence in Big Data (CIBD).

[35]  Sharath Chandra Guntuku,et al.  Big Data Analytics framework for Peer-to-Peer Botnet detection using Random Forests , 2014, Inf. Sci..

[36]  George K. Karagiannidis,et al.  Efficient Machine Learning for Big Data: A Review , 2015, Big Data Res..

[37]  Mohiuddin Ahmed,et al.  A survey of network anomaly detection techniques , 2016, J. Netw. Comput. Appl..

[38]  Xiaochun Cheng,et al.  A Distributed Anomaly Detection System for In-Vehicle Network Using HTM , 2018, IEEE Access.

[39]  Qihui Wu,et al.  A survey of machine learning for big data processing , 2016, EURASIP Journal on Advances in Signal Processing.

[40]  Elisabetta Raguseo,et al.  Big data technologies: An empirical investigation on their adoption, benefits and risks for companies , 2018, Int. J. Inf. Manag..

[41]  Sudipto Guha,et al.  Robust Random Cut Forest Based Anomaly Detection on Streams , 2016, ICML.

[42]  Gabriel Maciá-Fernández,et al.  Tackling the Big Data 4 vs for anomaly detection , 2014, 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[43]  Lei Shu,et al.  Survey of Fog Computing: Fundamental, Network Applications, and Research Challenges , 2018, IEEE Communications Surveys & Tutorials.

[44]  Gonzalo Mateos,et al.  Modeling and Optimization for Big Data Analytics: (Statistical) learning tools for our era of data deluge , 2014, IEEE Signal Processing Magazine.

[45]  Mario Vanhoucke,et al.  A Nearest Neighbour extension to project duration forecasting with Artificial Intelligence , 2017, Eur. J. Oper. Res..

[46]  Awais Ahmad,et al.  An efficient divide-and-conquer approach for big data analytics in machine-to-machine communication , 2016, Neurocomputing.

[47]  Arpan Kumar Kar,et al.  Big data with cognitive computing: A review for the future , 2018, Int. J. Inf. Manag..

[48]  Murat Uysal,et al.  Next generation M2M cellular networks: challenges and practical considerations , 2015, IEEE Communications Magazine.

[49]  Nor Badrul Anuar,et al.  The role of big data in smart city , 2016, Int. J. Inf. Manag..

[50]  Awais Ahmad,et al.  Real time intrusion detection system for ultra-high-speed big data environments , 2016, The Journal of Supercomputing.

[51]  Abdullah Gani,et al.  A survey on indexing techniques for big data: taxonomy and performance evaluation , 2016, Knowledge and Information Systems.

[52]  K. V. Promod,et al.  Mining a Ubiquitous Time and Attendance Schema Using Random Forests for Intrusion Detection , 2016 .

[53]  Joel J. P. C. Rodrigues,et al.  Autonomous profile-based anomaly detection system using principal component analysis and flow analysis , 2015, Appl. Soft Comput..

[54]  Victor Chang,et al.  A review and future direction of agile, business intelligence, analytics and data science , 2016, Int. J. Inf. Manag..

[55]  Abhay Bhadani,et al.  Big Data: Challenges, Opportunities and Realities , 2017, ArXiv.

[56]  Bhavani M. Thuraisingham,et al.  Online anomaly detection for multi‐source VMware using a distributed streaming framework , 2016, Softw. Pract. Exp..

[57]  Ming-Yang Su,et al.  Real-time anomaly detection systems for Denial-of-Service attacks by weighted k-nearest-neighbor classifiers , 2011, Expert Syst. Appl..

[58]  Athanasios V. Vasilakos,et al.  Big data: From beginning to future , 2016, Int. J. Inf. Manag..

[59]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[60]  Jean-Michel Poggi,et al.  Random Forests for Big Data , 2015, Big Data Res..

[61]  Chih-Fong Tsai,et al.  CANN: An intrusion detection system based on combining cluster centers and nearest neighbors , 2015, Knowl. Based Syst..

[62]  Miriam A. M. Capretz,et al.  Contextual anomaly detection framework for big sensor data , 2015, Journal of Big Data.

[63]  Boris N. Oreshkin,et al.  Machine learning approaches to network anomaly detection , 2007 .

[64]  Patrick Martin,et al.  The Six Pillars for Building Big Data Analytics Ecosystems , 2016, ACM Comput. Surv..

[65]  Baojiang Cui,et al.  Anomaly Detection Model Based on Hadoop Platform and Weka Interface , 2016, 2016 10th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS).

[66]  June-ho Bang,et al.  Anomaly detection of network-initiated LTE signaling traffic in wireless sensor and actuator networks based on a Hidden semi-Markov Model , 2017, Comput. Secur..

[67]  Zhe Chen,et al.  Anomaly Detection and Redundancy Elimination of Big Sensor Data in Internet of Things , 2017, ArXiv.

[68]  Yunhao Liu,et al.  Big Data: A Survey , 2014, Mob. Networks Appl..

[69]  Basem Almadani,et al.  Healthcare systems integration using Real Time Publish Subscribe (RTPS) middleware , 2016, Comput. Electr. Eng..

[70]  Taghi M. Khoshgoftaar,et al.  A survey of open source tools for machine learning with big data in the Hadoop ecosystem , 2015, Journal of Big Data.

[71]  Yusheng Wang,et al.  Anomaly detection in Industrial Autonomous Decentralized System based on time series , 2016, Simul. Model. Pract. Theory.

[72]  Muhammad Imran,et al.  Big data management in participatory sensing: Issues, trends and future directions , 2017, Future Gener. Comput. Syst..

[73]  María José del Jesús,et al.  A View on Fuzzy Systems for Big Data: Progress and Opportunities , 2016, Int. J. Comput. Intell. Syst..

[74]  Joao Bota,et al.  Big Data Analytics for Detecting Host Misbehavior in Large Logs , 2015, 2015 IEEE Trustcom/BigDataSE/ISPA.

[75]  Aida Mustapha,et al.  A Survey of Anomaly Detection Using Data Mining Methods for Hypertext Transfer Protocol Web Services , 2015, J. Comput. Sci..

[76]  Subutai Ahmad,et al.  Unsupervised real-time anomaly detection for streaming data , 2017, Neurocomputing.

[77]  Mamun Bin Ibne Reaz,et al.  A novel weighted support vector machines multiclass classifier based on differential evolution for intrusion detection systems , 2017, Inf. Sci..

[78]  Eric T. Bradlow,et al.  The Role of Big Data and Predictive Analytics in Retailing , 2017 .

[79]  Athanasios V. Vasilakos,et al.  The role of big data analytics in Internet of Things , 2017, Comput. Networks.

[80]  Erhan Guven,et al.  A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection , 2016, IEEE Communications Surveys & Tutorials.

[81]  Shan Suthaharan,et al.  Big data classification: problems and challenges in network intrusion prediction with machine learning , 2014, PERV.

[82]  Jose Miguel Puerta,et al.  An Application of Dynamic Bayesian Networks to Condition Monitoring and Fault Prediction in a Sensored System: a Case Study , 2017, Int. J. Comput. Intell. Syst..

[83]  Álvaro Rocha,et al.  A health data analytics maturity model for hospitals information systems , 2019, Int. J. Inf. Manag..

[84]  Clarence Y. Weston On the K-Nearest Neighbor approach to the generation of fuzzy rules for college student performance prediction , 2015 .

[85]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[86]  Lekha R. Nair,et al.  Applying spark based machine learning model on streaming big data for health status prediction , 2017, Comput. Electr. Eng..

[87]  Michel Laroche,et al.  Using big data analytics to study brand authenticity sentiments: The case of Starbucks on Twitter , 2017, Int. J. Inf. Manag..

[88]  M. A. Jabbar,et al.  Random Forest Modeling for Network Intrusion Detection System , 2016 .

[89]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[90]  Virgílio A. F. Almeida,et al.  Cyberwarfare and Digital Governance , 2017, IEEE Internet Computing.

[91]  Qi Liu,et al.  Unsupervised detection of contextual anomaly in remotely sensed data , 2017 .

[92]  Bhavani M. Thuraisingham,et al.  Spark-based anomaly detection over multi-source VMware performance data in real-time , 2014, 2014 IEEE Symposium on Computational Intelligence in Cyber Security (CICS).

[93]  Avita Katal,et al.  Big data: Issues, challenges, tools and Good practices , 2013, 2013 Sixth International Conference on Contemporary Computing (IC3).

[94]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[95]  Lior Rokach,et al.  Anomaly detection for smartphone data streams , 2017, Pervasive Mob. Comput..

[96]  Emmanuel Sirimal Silva,et al.  Forecasting with Big Data: A Review , 2015, Annals of Data Science.

[97]  Mohammed Erritali,et al.  Analyzing Social Media through Big Data using InfoSphere BigInsights and Apache Flume , 2017, EUSPN/ICTH.

[98]  Jaime Lloret,et al.  An m-health application for cerebral stroke detection and monitoring using cloud services , 2019, Int. J. Inf. Manag..

[99]  Hirozumi Yamaguchi,et al.  Design and Implementation of Middleware for IoT Devices toward Real-Time Flow Processing , 2016, 2016 IEEE 36th International Conference on Distributed Computing Systems Workshops (ICDCSW).

[100]  Rajiv Ranjan,et al.  Streaming Big Data Processing in Datacenter Clouds , 2014, IEEE Cloud Computing.

[101]  Tomás Pevný,et al.  Reducing false positives of network anomaly detection by local adaptive multivariate smoothing , 2017, J. Comput. Syst. Sci..

[102]  S. Mercy Shalinie,et al.  Real time detection and classification of DDoS attacks using enhanced SVM with string kernels , 2011, 2011 International Conference on Recent Trends in Information Technology (ICRTIT).

[103]  Ali A. Ghorbani,et al.  A Survey of Visualization Systems for Network Security , 2012, IEEE Transactions on Visualization and Computer Graphics.

[104]  Tapabrata Ray,et al.  Differential Evolution With Dynamic Parameters Selection for Optimization Problems , 2014, IEEE Transactions on Evolutionary Computation.