Conceptual Survey on Data Stream Processing Systems

The present paper gives an overview about the state of the art technology within the area of data stream processing systems. Although the area of stream processing systems is not new, it is receiving a greater interest in the light of current business trends like the Internet of Things (IoT). The comparison of systems thereby includes several aspects such as a look into their architectures as well as into the responsibilities of the corresponding system components. A ranking or recommendations for one or more system(s) is not part of the work.

[1]  Craig Chambers,et al.  The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing , 2015, Proc. VLDB Endow..

[2]  Jignesh M. Patel,et al.  Twitter Heron: Stream Processing at Scale , 2015, SIGMOD Conference.

[3]  Saverio Niccolini,et al.  Scaling Out the Performance of Service Monitoring Applications with BlockMon , 2013, PAM.

[4]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[5]  Daniel Mills,et al.  MillWheel: Fault-Tolerant Stream Processing at Internet Scale , 2013, Proc. VLDB Endow..

[6]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[7]  Tariq Rahim Soomro,et al.  Big Data Analysis: Apache Spark Perspective , 2015 .

[8]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[9]  Pieter Hintjens,et al.  ZeroMQ: Messaging for Many Applications , 2013 .

[10]  Raimund Kirner,et al.  Demand-Based Scheduling Priorities for Performance Optimisation of Stream Programs on Parallel Platforms , 2013, ICA3PP.

[11]  Gang Wu,et al.  Stream Bench: Towards Benchmarking Modern Distributed Stream Computing Frameworks , 2014, 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing.

[12]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[13]  Felix Naumann,et al.  The Stratosphere platform for big data analytics , 2014, The VLDB Journal.

[14]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[15]  Rodrigo A. Vivanco,et al.  Scientific computing with Java and Cpp: a case study using functional magnetic resonance neuroimages , 2005 .

[16]  Rodrigo A. Vivanco,et al.  Scientific computing with Java and C++: a case study using functional magnetic resonance neuroimages , 2005, Softw. Pract. Exp..

[17]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[18]  Lukasz Golab,et al.  Multi-query optimization of sliding window aggregates by schedule synchronization , 2006, CIKM '06.

[19]  Marisa Gil,et al.  JVM: platform independent vs. performance dependent , 2003, OPSR.

[20]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[21]  János Dániel Bali Streaming Graph Analytics Framework Design , 2015 .

[22]  Igor Brigadir,et al.  Real Time Event Monitoring with Trident , 2013 .

[23]  Eric Bouillet,et al.  TRISTAN: Real-time analytics on massive time series using sparse dictionary compression , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[24]  Wenfei Fan,et al.  Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data , 2014 .

[25]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[26]  Seif Haridi,et al.  Lightweight Asynchronous Snapshots for Distributed Dataflows , 2015, ArXiv.

[27]  Divyakant Agrawal,et al.  Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data , 2010, SIGMOD 2010.

[28]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[29]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[30]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[31]  Calton Pu,et al.  Continual Queries for Internet Scale Event-Driven Information Delivery , 1999, IEEE Trans. Knowl. Data Eng..

[32]  Jennifer Widom,et al.  STREAM: The Stanford Data Stream Management System , 2016, Data Stream Management.

[33]  Xin Zhang,et al.  An improved topology schedule algorithm for storm system , 2015 .

[34]  Kevin Ashton,et al.  That ‘Internet of Things’ Thing , 1999 .

[35]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.