MOLESTRA : A MultiTask Learning Approach for Real-Time Big Data Analytics

Modern critical infrastructures are characterized by a high degree of complexity, in terms of vulnerabilities, threats, and interdependencies that characterize them. The possible causes of a digital assault or occurrence of a digital attack are not simple to identify, as they may be due to a chain of seemingly insignificant incidents, the combination of which provokes the occurrence of scalar effects on multiple levels. Similarly, the digital explosion of technologies related to the critical infrastructure and the technical characteristics of their subsystems entails the continuous production of a huge amount of data from heterogeneous sources, requiring the adoption of intelligent techniques for critical analysis and optimal decision making. In many applications (e.g. network traffic monitoring) data is received at a high frequency over time. Thus, it is not possible to store all historical samples, which implies that they should be processed in real time and that it may not be possible to rereview old samples (one-pass constraint). We should consider the importance of protecting critical infrastructure, combined with the fact that many of these systems are cyber-attack targets, but they cannot easily be disconnected from their layout as this could lead to generalized operational problems. This research paper proposes a Multi-Task Learning model for Real-Time & Large-Scale Data Analytics, towards the Cyber protection of Critical Infrastructure. More specifically, it suggests the Multi Overlap LEarning STReaming Analytics (MOLESTRA) which is a standardization of the "Kappa" architecture. The aim is the analysis of large data sets where the tasks are executed in an overlapping manner. This is done to ensure the utilization of the cognitive or learning relationships among the data flows. The proposed architecture uses the k-NN Classifier with Self Adjusting Memory (k-NN SAM). MOLESTRA, provides a clear and effective way to separate the short-term from the long-term memory. In this way the temporal intervals between the transfer of knowledge from one memory to the other and vice versa are differentiated. Keywords—“Kappa” Architecture, Multi-Task Learning, Big Data, Data Streams, Critical Infrastructure Protection, Advanced Persistent Threat

[1]  Konstantinos Demertzis,et al.  Adaptive Elitist Differential Evolution Extreme Learning Machines on Big Data: Intelligent Recognition of Invasive Species , 2016, INNS Conference on Big Data.

[2]  Konstantinos Demertzis,et al.  Hybrid Soft Computing for Atmospheric Pollution-Climate Change Data Mining , 2018, Trans. Comput. Collect. Intell..

[3]  Konstantinos Demertzis,et al.  ADvoCATE: A Consent Management Platform for Personal Data Processing in the IoT Using Blockchain Technology , 2018, SecITC.

[4]  Konstantinos Demertzis,et al.  Cyber-Typhon: An Online Multi-task Anomaly Detection Framework , 2019, AIAI.

[5]  Konstantinos Demertzis,et al.  Temporal Modeling of Invasive Species' Migration in Greece from Neighboring Countries Using Fuzzy Cognitive Maps , 2018, AIAI.

[6]  Konstantinos Demertzis,et al.  Soft computing forecasting of cardiovascular and respiratory incidents based on climate change scenarios , 2018, 2018 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS).

[7]  Thomas G. Dietterich Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[8]  Konstantinos Demertzis,et al.  Evolving Smart URL Filter in a Zone-Based Policy Firewall for Detecting Algorithmically Generated Malicious Domains , 2015, SLDS.

[9]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[10]  Konstantinos Demertzis,et al.  HISYCOL a hybrid computational intelligence system for combined machine learning: the case of air pollution modeling in Athens , 2015, Neural Computing and Applications.

[11]  Yu Zhang Parallel Multi-task Learning , 2015, 2015 IEEE International Conference on Data Mining.

[12]  Konstantinos Demertzis,et al.  A deep spiking machine-hearing system for the case of invasive fish species , 2017, 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA).

[13]  João Gama,et al.  Evaluation of recommender systems in streaming environments , 2015, ArXiv.

[14]  Konstantinos Demertzis,et al.  Machine learning use in predicting interior spruce wood density utilizing progeny test information , 2017, Neural Computing and Applications.

[15]  Konstantinos Demertzis,et al.  The Next Generation Cognitive Security Operations Center: Network Flow Forensics Using Cybersecurity Intelligence , 2018, Big Data Cogn. Comput..

[16]  Konstantinos Demertzis,et al.  A Dynamic Ensemble Learning Framework for Data Stream Analysis and Real-Time Threat Detection , 2018, ICANN.

[17]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[18]  L. Iliadis,et al.  Ladon: A Cyber-Threat Bio-Inspired Intelligence Management System , 2016 .

[19]  L. Iliadis,et al.  Cognitive Web Application Firewall to Critical Infrastructures Protection from Phishing Attacks , 2019 .

[20]  Konstantinos Demertzis,et al.  An innovative soft computing system for smart energy grids cybersecurity , 2018 .

[21]  Jimmy J. Lin,et al.  The Lambda and the Kappa , 2017, IEEE Internet Computing.

[22]  Mladen Kolar,et al.  Distributed Multitask Learning , 2015, ArXiv.

[23]  Konstantinos Demertzis,et al.  Artificial Intelligence Applications and Innovations: 18th IFIP WG 12.5 International Conference, AIAI 2022, Hersonissos, Crete, Greece, June 17–20, 2022, Proceedings, Part II , 2022, IFIP Advances in Information and Communication Technology.

[24]  Konstantinos Demertzis,et al.  A Machine Hearing Framework for Real-Time Streaming Analytics Using Lambda Architecture , 2019, EANN.

[25]  Konstantinos Demertzis,et al.  Hybrid Unsupervised Modeling of Air Pollution Impact to Cardiovascular and Respiratory Diseases , 2017, Int. J. Inf. Syst. Crisis Response Manag..

[26]  Konstantinos Demertzis,et al.  Fuzzy Cognitive Maps for Long-Term Prognosis of the Evolution of Atmospheric Pollution, Based on Climate Change Scenarios: The Case of Athens , 2016, ICCCI.

[27]  Konstantinos Demertzis,et al.  Hybrid intelligent modeling of wild fires risk , 2018, Evol. Syst..

[28]  Sinno Jialin Pan,et al.  Distributed Multi-Task Relationship Learning , 2017, KDD.

[29]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[30]  Konstantinos Demertzis,et al.  A Computational Intelligence System Identifying Cyber-Attacks on Smart Energy Grids , 2018 .

[31]  Konstantinos Demertzis,et al.  Computational intelligence anti-malware framework for android OS , 2017, Vietnam Journal of Computer Science.

[32]  Geoff Holmes,et al.  Evaluation methods and decision theory for classification of streaming data with temporal dependence , 2015, Machine Learning.

[33]  Konstantinos Demertzis,et al.  Classifying with fuzzy chi-square test: The case of invasive species , 2018 .

[34]  Konstantinos Demertzis,et al.  Detecting invasive species with a bio-inspired semi-supervised neurocomputing approach: the case of Lagocephalus sceleratus , 2017, Neural Computing and Applications.

[35]  Konstantinos Demertzis,et al.  The Next Generation Cognitive Security Operations Center: Adaptive Analytic Lambda Architecture for Efficient Defense against Adversarial Attacks , 2019, Big Data Cogn. Comput..

[36]  Ian P. Turnipseed,et al.  Industrial Control System Simulation and Data Logging for Intrusion Detection System Research , 2015 .

[37]  Konstantinos Demertzis,et al.  MOLESTRA: A Multi-Task Learning Approach for Real-Time Big Data Analytics , 2018, 2018 Innovations in Intelligent Systems and Applications (INISTA).

[38]  Konstantinos Demertzis,et al.  A Spiking One-Class Anomaly Detection Framework for Cyber-Security on Industrial Control Systems , 2017, EANN.

[39]  Dongho Won,et al.  A Practical Study on Advanced Persistent Threats , 2012 .

[40]  Konstantinos Demertzis,et al.  A Hybrid Network Anomaly and Intrusion Detection Approach Based on Evolving Spiking Neural Network Classification , 2013, e-Democracy.

[41]  Mladen Kolar,et al.  Distributed Multi-Task Learning with Shared Representation , 2016, ArXiv.

[42]  Paul Fergus,et al.  A Survey of Critical Infrastructure Security , 2014, Critical Infrastructure Protection.

[43]  Konstantinos Demertzis,et al.  FuSSFFra, a fuzzy semi-supervised forecasting framework: the case of the air pollution in Athens , 2018, Neural Computing and Applications.

[44]  Clare Stanier,et al.  Towards Differentiating Business Intelligence, Big Data, Data Analytics and Knowledge Discovery , 2016, ERP Future.

[45]  Konstantinos Demertzis,et al.  Evolving Computational Intelligence System for Malware Detection , 2014, CAiSE Workshops.

[46]  W. Marsden I and J , 2012 .

[47]  Konstantinos Demertzis,et al.  Intelligent Bio-Inspired Detection of Food Borne Pathogen by DNA Barcodes: The Case of Invasive Fish Species Lagocephalus Sceleratus , 2015, EANN.

[48]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[49]  Konstantinos Demertzis,et al.  Commentary: Aedes albopictus and Aedes japonicas—two invasive mosquito species with different temperature niches in Europe , 2017, Front. Environ. Sci..

[50]  Konstantinos Demertzis,et al.  Hybrid Soft Computing Analytics of Cardiorespiratory Morbidity and Mortality Risk Due to Air Pollution , 2017, ISCRAM-med.

[51]  Massimiliano Pontil,et al.  Exploiting Unrelated Tasks in Multi-Task Learning , 2012, AISTATS.

[52]  Konstantinos Demertzis,et al.  Fast and low cost prediction of extreme air pollution values with hybrid unsupervised learning , 2016, Integr. Comput. Aided Eng..

[53]  Konstantinos Demertzis,et al.  A Bio-Inspired Hybrid Artificial Intelligence Framework for Cyber Security , 2015 .

[54]  Harold J. Kushner,et al.  Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.

[55]  Konstantinos Demertzis,et al.  The Impact of Climate Change on Biodiversity: The Ecological Consequences of Invasive Species in Greece , 2018 .

[56]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[57]  W. Beyer CRC Standard Probability And Statistics Tables and Formulae , 1990 .

[58]  Konstantinos Demertzis,et al.  Blockchain-based Consents Management for Personal Data Processing in the IoT Ecosystem , 2018, ICETE.

[59]  Kevin Aretz,et al.  Asymmetric Loss Functions and the Rationality of Expected Stock Returns , 2009 .

[60]  Konstantinos Demertzis,et al.  Semi-supervised Hybrid Modeling of Atmospheric Pollution in Urban Centers , 2016, EANN.

[61]  Konstantinos Demertzis,et al.  SAME: An Intelligent Anti-malware Extension for Android ART Virtual Machine , 2015, ICCCI.

[62]  Konstantinos Demertzis,et al.  Bio-inspired Hybrid Intelligent Method for Detecting Android Malware , 2016, KICSS.

[63]  Konstantinos Demertzis,et al.  Extreme deep learning in biosecurity: the case of machine hearing for marine species identification , 2018, J. Inf. Telecommun..

[64]  Konstantinos Demertzis,et al.  Comparative analysis of exhaust emissions caused by chainsaws with soft computing and statistical approaches , 2018, International Journal of Environmental Science and Technology.