A Generic Architectural Framework for Machine Learning on Data Streams

In the past years, the importance of processing data streams increased with the emergence of new technologies and application domains. The Internet of Things provides many examples in which processing and analyzing data streams are critical success factors. With the growing amount of data, the usage of machine learning (ML) algorithms has become an essential part of data analysis. However, the high volume and velocity of data presents new challenges, which need to be addressed, e.g. frequent model changes, concept drift or insufficient time to train models. From our point of view, these challenges cannot be tackled alone by using an algorithm-centric approach, i.e. to focus solely on finding appropriate algorithms, and neglecting the structure of the overall processing system.

[1]  Shamimul Qamar,et al.  The 51 V's Of Big Data: Survey, Technologies, Characteristics, Opportunities, Issues and Challenges , 2019, COINS.

[2]  Tuure Tuunanen,et al.  Design Science Research Evaluation , 2012, DESRIST.

[3]  Anand Sharma,et al.  Physical Access System Security of IoT Devices using Machine Learning Techniques , 2019 .

[4]  Serge Demeyer,et al.  Migrating towards microservices: migration and architecture smells , 2018, IWoR@ASE.

[5]  Ali A. Ghorbani,et al.  Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization , 2018, ICISSP.

[6]  Michael J. Donahoo,et al.  Contextual understanding of microservice architecture: current and future directions , 2018, SIAP.

[7]  Zhi-Hua Zhou,et al.  Classification Under Streaming Emerging New Classes: A Solution Using Completely-Random Trees , 2016, IEEE Transactions on Knowledge and Data Engineering.

[8]  D. L. Parnas,et al.  On the criteria to be used in decomposing systems into modules , 1972, Software Pioneers.

[9]  João Gama,et al.  On evaluating stream learning algorithms , 2012, Machine Learning.

[10]  Taxonomies Subgroup. NIST Big Data Interoperability Framework:: volume 6, reference architecture version 3 , 2019 .

[11]  Shan Suthaharan,et al.  Big data classification: problems and challenges in network intrusion prediction with machine learning , 2014, PERV.

[12]  Norman Spangenberg,et al.  An Architectural Blueprint for a Multi-purpose Anomaly Detection on Data Streams , 2019, ICEIS.

[13]  Bogdan Franczyk,et al.  SEPL: An IoT Platform for Value-added Services in the Energy Domain - Architectural Concept and Software Prototype , 2018, ICEIS.

[14]  Kleanthis Thramboulidis,et al.  Cyber-physical microservices: An IoT-based framework for manufacturing systems , 2018, 2018 IEEE Industrial Cyber-Physical Systems (ICPS).

[15]  Inder Monga,et al.  Lambda architecture for cost-effective batch and speed big data processing , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[16]  Norman Spangenberg,et al.  Applying machine learning to big data streams : An overview of challenges , 2017, 2017 IEEE 4th International Conference on Soft Computing & Machine Intelligence (ISCMI).

[17]  João Gama,et al.  Ensemble learning for data stream analysis: A survey , 2017, Inf. Fusion.

[18]  Bogdan Franczyk,et al.  A Personal Analytics Platform for the Internet of Things - Implementing Kappa Architecture with Microservice-based Stream Processing , 2017, ICEIS.

[19]  Murat Dundar,et al.  Learning Classifiers When the Training Data Is Not IID , 2007, IJCAI.

[20]  Jignesh M. Patel,et al.  Big data and its technical challenges , 2014, CACM.

[21]  Mike P. Papazoglou,et al.  A Reference Architecture and Knowledge-Based Structures for Smart Manufacturing Networks , 2015 .

[22]  Athanasios V. Vasilakos,et al.  Machine learning on big data: Opportunities and challenges , 2017, Neurocomputing.

[23]  Miriam A. M. Capretz,et al.  Machine Learning With Big Data: Challenges and Approaches , 2017, IEEE Access.

[24]  Horst Lichter,et al.  Designing a Next-Generation Continuous Software Delivery System: Concepts and Architecture , 2018, 2018 IEEE/ACM 4th International Workshop on Rapid Continuous Software Engineering (RCoSE).

[25]  Francisco Herrera,et al.  A survey on data preprocessing for data stream mining: Current status and future directions , 2017, Neurocomputing.

[26]  Junhong Wang,et al.  Dynamic extreme learning machine for data stream classification , 2017, Neurocomputing.

[27]  Saso Dzeroski,et al.  Multi-label classification via multi-target regression on data streams , 2016, Machine Learning.

[28]  Neoklis Polyzotis,et al.  Data Management Challenges in Production Machine Learning , 2017, SIGMOD Conference.

[29]  Florin Pop,et al.  Deep learning model for home automation and energy reduction in a smart home environment platform , 2018, Neural Computing and Applications.

[30]  Christos V. Verikoukis,et al.  Scalable and Flexible IoT data analytics: when Machine Learning meets SDN and Virtualization , 2018, 2018 IEEE 23rd International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD).

[31]  Diane J. Cook,et al.  CASAS: A Smart Home in a Box , 2013, Computer.

[32]  VarvarigouTheodora,et al.  Employing traditional machine learning algorithms for big data streams analysis , 2017 .

[33]  V VasilakosAthanasios,et al.  Machine learning on big data , 2017 .

[34]  Felix Larrinaga,et al.  Implementation of a Reference Architecture for Cyber Physical Systems to support Condition Based Maintenance , 2018, 2018 5th International Conference on Control, Decision and Information Technologies (CoDIT).

[35]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[36]  Christian Bonnet,et al.  Next-Generation, Data Centric and End-to-End IoT Architecture Based on Microservices , 2018, 2018 IEEE International Conference on Consumer Electronics - Asia (ICCE-Asia).

[37]  Awais Ahmad,et al.  Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem , 2018, International Journal of Parallel Programming.

[38]  Khaled Ghédira,et al.  Discussion and review on evolving data streams and concept drift adapting , 2018, Evol. Syst..

[39]  Jan Pries-Heje,et al.  FEDS: a Framework for Evaluation in Design Science Research , 2016, Eur. J. Inf. Syst..

[40]  L MinkuLeandro,et al.  Ensemble learning for data stream analysis , 2017 .

[41]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[42]  Nader Sadegh,et al.  A perceptron network for functional identification and control of nonlinear systems , 1993, IEEE Trans. Neural Networks.

[43]  Frank Kargl,et al.  Sequence-aware Intrusion Detection in Industrial Control Systems , 2015, CPSS@ASIACSS.

[44]  Talel Abdessalem,et al.  Adaptive random forests for evolving data stream classification , 2017, Machine Learning.

[45]  HerreraFrancisco,et al.  A survey on data preprocessing for data stream mining , 2017 .

[46]  Klaus Moessner,et al.  Predictive Analytics for Complex IoT Data Streams , 2017, IEEE Internet of Things Journal.

[47]  Amit P. Sheth,et al.  Machine learning for Internet of Things data analysis: A survey , 2017, Digit. Commun. Networks.

[48]  Jesse Read,et al.  Data Stream Classification Using Random Feature Functions and Novel Method Combinations , 2015, 2015 IEEE Trustcom/BigDataSE/ISPA.

[49]  Muhammad Younas,et al.  Research challenges of big data , 2019, Service Oriented Computing and Applications.

[50]  Nada Lavrac,et al.  Stream-based active learning for sentiment analysis in the financial domain , 2014, Inf. Sci..

[51]  Bartley D. Richardson,et al.  Sequence Aggregation Rules for Anomaly Detection in Computer Network Traffic , 2018, ArXiv.

[52]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.