Chi: A Scalable and Programmable Control Plane for Distributed Stream Processing Systems

Stream-processing workloads and modern shared cluster environments exhibit high variability and unpredictability. Combined with the large parameter space and the diverse set of user SLOs, this makes modern streaming systems very challenging to statically configure and tune. To address these issues, in this paper we investigate a novel control-plane design, Chi, which supports continuous monitoring and feedback, and enables dynamic re-configuration. Chi leverages the key insight of embedding control-plane messages in the data-plane channels to achieve a low-latency and flexible control plane for stream-processing systems. Chi introduces a new reactive programming model and design mechanisms to asynchronously execute control policies, thus avoiding global synchronization. We show how this allows us to easily implement a wide spectrum of control policies targeting different use cases observed in production. Large-scale experiments using production workloads from a popular cloud provider demonstrate the flexibility and efficiency of our approach.

[1]  Wei Lin,et al.  StreamScope: Continuous Reliable Distributed Processing of Big Data Streams , 2016, NSDI.

[2]  Nesime Tatbul,et al.  Changing flights in mid-air: a model for safely modifying continuous queries , 2011, SIGMOD '11.

[3]  George Kollios,et al.  MRShare , 2010, Proc. VLDB Endow..

[4]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[5]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[6]  John Liagouris,et al.  Online Reconstruction of Structural Information from Datacenter Logs , 2017, EuroSys.

[7]  D. Skuse Recovery , 2010, International psychiatry : bulletin of the Board of International Affairs of the Royal College of Psychiatrists.

[8]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[9]  Peter Bailis,et al.  Demonstration: MacroBase, A Fast Data Analysis Engine , 2017, SIGMOD Conference.

[10]  Michael I. Jordan,et al.  Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.

[11]  Jignesh M. Patel,et al.  Twitter Heron: Stream Processing at Scale , 2015, SIGMOD Conference.

[12]  Badrish Chandramouli,et al.  Trill: A High-Performance Incremental Query Processor for Diverse Analytics , 2014, Proc. VLDB Endow..

[13]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[14]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[15]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[16]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[17]  Guoping Wang,et al.  Multi-Query Optimization in MapReduce Framework , 2013, Proc. VLDB Endow..

[18]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[19]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[20]  James R. Larus,et al.  Orleans: cloud computing for everyone , 2011, SoCC.

[21]  Daniel Mills,et al.  MillWheel: Fault-Tolerant Stream Processing at Internet Scale , 2013, Proc. VLDB Endow..

[22]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[23]  Raul Castro Fernandez,et al.  Integrating scale out and fault tolerance in stream processing using operator state management , 2013, SIGMOD '13.

[24]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[25]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[26]  Ali Ghodsi,et al.  Drizzle: Fast and Adaptable Stream Processing at Scale , 2017, SOSP.

[27]  Michael Stonebraker,et al.  S-Store: A Streaming NewSQL System for Big Velocity Applications , 2014, Proc. VLDB Endow..

[28]  Badrish Chandramouli,et al.  Query suspend and resume , 2007, SIGMOD '07.

[29]  Sriram Rao,et al.  Dhalion: Self-Regulating Stream Processing in Heron , 2017, Proc. VLDB Endow..

[30]  Seif Haridi,et al.  Lightweight Asynchronous Snapshots for Distributed Dataflows , 2015, ArXiv.

[31]  David Maier,et al.  Exploiting Punctuation Semantics in Continuous Data Streams , 2003, IEEE Trans. Knowl. Data Eng..

[32]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[33]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[34]  Peter Sanders,et al.  Thrill: High-performance algorithmic distributed batch data processing with C++ , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[35]  Craig Chambers,et al.  FlumeJava: easy, efficient data-parallel pipelines , 2010, PLDI '10.

[36]  Craig Chambers,et al.  The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing , 2015, Proc. VLDB Endow..

[37]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[38]  Pradeep Dubey,et al.  Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs , 2009, Proc. VLDB Endow..