Language Runtime and Optimizations in IBM Streams

Stream processing is important for continuously transforming and analyzing the deluge of data that has revolutionized our world. Given the diversity of application domains, streaming applications must be both easy to write and performant. Both goals can be accomplished by high-level programming languages. Dedicated language syntax helps express stream programs clearly and concisely, whereas the compiler and runtime system of the language help optimize runtime performance. This paper describes the language runtime for the IBM Streams Processing Language (SPL) used to program the distributed IBM Streams platform. It gives a system overview and explains several language-based optimizations implemented in the SPL runtime: fusion, thread placement, fission, and transport optimizations.

[1]  Yuzhe Tang,et al.  Autopipelining for Data Stream Processing , 2013, IEEE Transactions on Parallel and Distributed Systems.

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  Yoonho Park,et al.  Evaluation of a high‐volume, low‐latency market data processing system implemented with IBM middleware , 2012, Softw. Pract. Exp..

[4]  William Thies,et al.  Cache aware optimization of stream programs , 2005, LCTES '05.

[5]  Kun-Lung Wu,et al.  Elastic Scaling for Data Stream Processing , 2014, IEEE Transactions on Parallel and Distributed Systems.

[6]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[7]  Daniel Mills,et al.  MillWheel: Fault-Tolerant Stream Processing at Internet Scale , 2013, Proc. VLDB Endow..

[8]  Robert Grimm,et al.  A Universal Calculus for Stream Processing Languages , 2010, ESOP.

[9]  Kun-Lung Wu,et al.  COLA: Optimizing Stream Processing Applications via Graph Partitioning , 2009, Middleware.

[10]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[11]  International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014 , 2014, SIGMOD Conference.

[12]  Kun-Lung Wu,et al.  IBM Streams Processing Language: Analyzing Big Data in motion , 2013, IBM J. Res. Dev..

[13]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[14]  Angelos Bilas,et al.  Understanding and improving the cost of scaling distributed event processing , 2012, DEBS.

[15]  Michael I. Gordon,et al.  Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.

[16]  Badrish Chandramouli,et al.  The extensibility framework in Microsoft StreamInsight , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[17]  Bugra Gedik,et al.  Generic windowing support for extensible stream processing systems , 2014, Softw. Pract. Exp..

[18]  Kun-Lung Wu,et al.  Auto-parallelizing stateful distributed streaming applications , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[19]  Robert Grimm,et al.  A catalog of stream processing optimizations , 2014, ACM Comput. Surv..

[20]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[21]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[22]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[23]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.