A Software Chain Approach to Big Data Stream Processing and Analytics

Big Data Stream processing is among the most important computing trends nowadays. The growing interest on Big Data Stream processing comes from the need of many Internet-based applications that generate huge data streams, whose processing can serve to extract useful analytics and inform for decision making systems. For instance, an IoT-based monitoring systems for a supply-chain, can provide real time data analytics for the business delivery performance. The challenges of processing Big Data Streams reside on coping with real-time processing of an unbounded stream of data, that is, the computing system should be able to compute at high throughput to accommodate the high data stream rate generation in input. Clearly, the higher the data stream rate, the higher should be the throughput to achieve consistency of the processing results (e.g. Preserving the order of events in the data stream). In this paper we show how to map the data stream processing phases (from data generation to final results) to a software chain architecture, which comprises five main components: sensor, extractor, parser, formatter and out putter. We exemplify the approach using the Yahoo!S4 for processing the Big Data Stream from Flight Radar24 global flight monitoring system.

[1]  Miroslaw Malek,et al.  Comprehensive logfiles for autonomic systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[2]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[3]  Carl Hewitt,et al.  A Universal Modular ACTOR Formalism for Artificial Intelligence , 1973, IJCAI.

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Fatos Xhafa,et al.  Distributed-based massive processing of activity logs for efficient user modeling in a Virtual Campus , 2013, Cluster Computing.

[6]  S. Kotoulas,et al.  High-performance Distributed Stream Reasoning using S4 , 2011 .

[7]  Shaiful Alam Chowdhury,et al.  Performance Evaluation of Yahoo! S4: A First Look , 2012, 2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.

[8]  Gilad Mishne,et al.  Fast data in the era of big data: Twitter's real-time related query suggestion architecture , 2012, SIGMOD '13.

[9]  Fatos Xhafa,et al.  Processing and Analytics of Big Data Streams with Yahoo!S4 , 2015, 2015 IEEE 29th International Conference on Advanced Information Networking and Applications.

[10]  Fatos Xhafa,et al.  Using Grid services to parallelize IBM's Generic Log Adapter , 2011, J. Syst. Softw..

[11]  C. K. Jha,et al.  MapReduce: Simplified Data Analysis of Big Data , 2015 .