Systematic mapping for big data stream processing frameworks

There has been lots of discussions about the choice of a stream processing framework (SPF) for Big Data. Each of the SPFs has different cutting edge technologies in their steps of processing the data in motion that gives them a better advantage over the others. Even though, the cutting edge technologies used in each stream processing framework might better them, it is still hard to say which framework bests the rest under different scenarios and conditions. In this study, we aim to show trends and differences about several SPFs for Big Data by using the Systematic Mapping (SM) approach. To achieve our objectives, we raise 6 research questions (RQs), in which 91 studies that conducted between 2010 and 2015 were evaluated. We present the trends by classifying the research on SPFs with respect to the proposed RQs which can help researchers to obtain an overview of the field.

[1]  Maria J Grant,et al.  A typology of reviews: an analysis of 14 review types and associated methodologies. , 2009, Health information and libraries journal.

[2]  Jin Dong,et al.  Big data technologies in support of real time capturing and understanding of electric vehicle customers dynamics , 2014, 2014 IEEE 5th International Conference on Software Engineering and Service Science.

[3]  Kai Petersen,et al.  Systematic Mapping Studies in Software Engineering , 2008, EASE.

[4]  Daniel Mills,et al.  MillWheel: Fault-Tolerant Stream Processing at Internet Scale , 2013, Proc. VLDB Endow..

[5]  Michael Stonebraker,et al.  The 8 requirements of real-time stream processing , 2005, SGMD.

[6]  Dmitry Namiot,et al.  On Big Data Stream Processing , 2015 .

[7]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[8]  Alain Biem,et al.  IBM infosphere streams for scalable, real-time, intelligent transportation services , 2010, SIGMOD Conference.

[9]  Xike Xie,et al.  Survey of real-time processing systems for big data , 2014, IDEAS.

[10]  Ayman Elnaggar,et al.  Towards Real-Time Analytics in the Cloud , 2013, 2013 IEEE Ninth World Congress on Services.

[11]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[12]  Jignesh M. Patel,et al.  Twitter Heron: Stream Processing at Scale , 2015, SIGMOD Conference.

[13]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[14]  Xiaohui Yu,et al.  Pollux: towards scalable distributed real-time search on microblogs , 2013, EDBT '13.

[15]  Telmo da Silva Morais Survey on Frameworks for Distributed Computing: Hadoop, Spark and Storm , 2015 .

[16]  Pavel Zezula,et al.  Towards Fast Multimedia Feature Extraction: Hadoop or Storm , 2014, 2014 IEEE International Symposium on Multimedia.