论文信息 - TurboStream: Towards Low-Latency Data Stream Processing

TurboStream: Towards Low-Latency Data Stream Processing

Data Stream Processing (DSP) applications are often modelled as a directed acyclic graph: operators with data streams among them. Inter-operator communications can have a significant impact on the latency of DSP applications, accounting for 86% of the total latency. Despite their impact, there has been relatively little work on optimizing inter-operator communications, focusing on reducing inter-node traffic but not considering inter-process communication (IPC) inside a node, which often generates high latency due to the multiple memory-copy operations. This paper describes the design and implementation of TurboStream, a new DSP system designed specifically to address the high latency caused by inter-operator communications. To achieve this goal, we introduce (1) an improved IPC framework with OSRBuffer, a DSP-oriented buffer, to reduce memory-copy operations and waiting time of each single message when transmitting messages between the operators inside one node, and (2) a coarse-grained scheduler that consolidates operator instances and assigns them to nodes to diminish the inter-node IPC traffic. Using a prototype implementation, we show that our improved IPC framework reduces the end-to-end latency of intra-node IPC by 45.64% to 99.30%. Moreover, TurboStream reduces the latency of DSP by 83.23% compared to JStorm.

[1] Margo I. Seltzer,et al. Network-Aware Operator Placement for Stream-Processing Systems , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2] Dhabaleswar K. Panda,et al. High performance RDMA-based design of HDFS over InfiniBand , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[3] Seif Haridi,et al. Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[4] Jignesh M. Patel,et al. Storm@twitter , 2014, SIGMOD Conference.

[5] Richard T. B. Ma,et al. Smooth Task Migration in Apache Storm , 2015, SIGMOD Conference.

[6] Shrideep Pallickara,et al. Online Scheduling and Interference Alleviation for Low-Latency, High-Throughput Processing of Data Streams , 2017, IEEE Transactions on Parallel and Distributed Systems.

[7] Guillaume Mercier,et al. Cache-Efficient, Intranode, Large-Message MPI Communication with MPICH2-Nemesis , 2009, 2009 International Conference on Parallel Processing.

[8] Bingsheng He,et al. AdaStorm: Resource Efficient Storm with Adaptive Configuration , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[9] Cong Xu,et al. JVM-Bypass for Efficient Hadoop Shuffling , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[10] Mohammad Hosseini,et al. R-Storm: Resource-Aware Scheduling in Storm , 2015, Middleware.

[11] Jian Tang,et al. T-Storm: Traffic-Aware Online Scheduling in Storm , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[12] Kun-Lung Wu,et al. SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems , 2008, Middleware.

[13] Jignesh M. Patel,et al. Twitter Heron: Stream Processing at Scale , 2015, SIGMOD Conference.

[14] Thomas S. Heinze,et al. Latency-aware elastic scaling for distributed data stream processing systems , 2014, DEBS '14.

[15] Sayantan Sur,et al. LiMIC: support for high-performance MPI intra-node communication on Linux cluster , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[16] Anthony Skjellum,et al. A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[17] George Bosilca,et al. Locality and Topology Aware Intra-node Communication among Multicore CPUs , 2010, EuroMPI.

[18] John D. Valois. Lock-free linked lists using compare-and-swap , 1995, PODC '95.

[19] Raul Castro Fernandez,et al. Making State Explicit for Imperative Big Data Processing , 2014, USENIX Annual Technical Conference.

[20] Guillaume Mercier,et al. Design and evaluation of Nemesis, a scalable, low-latency, message-passing communication subsystem , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[21] Raul Castro Fernandez,et al. Integrating scale out and fault tolerance in stream processing using operator state management , 2013, SIGMOD '13.

[22] Roberto Baldoni,et al. Adaptive online scheduling in storm , 2013, DEBS.

[23] Kun-Lung Wu,et al. Elastic scaling of data parallel operators in stream processing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[24] Hai Jin,et al. Runtime‐aware adaptive scheduling in stream processing , 2016, Concurr. Comput. Pract. Exp..