A Universal Calculus for Stream Processing Languages

Stream processing applications such as algorithmic trading, MPEG processing, and web content analysis are ubiquitous and essential to business and entertainment. Language designers have developed numerous domain-specific languages that are both tailored to the needs of their applications, and optimized for performance on their particular target platforms. Unfortunately, the goals of generality and performance are frequently at odds, and prior work on the formal semantics of stream processing languages does not capture the details necessary for reasoning about implementations. This paper presents Brooklet, a core calculus for stream processing that allows us to reason about how to map languages to platforms and how to optimize stream programs. We translate from three representative languages, CQL, StreamIt, and Sawzall, to Brooklet, and show that the translations are correct. We formalize three popular and vital optimizations, data-parallel computation, operator fusion, and operator re-ordering, and show under which conditions they are correct. Language designers can use Brooklet to specify exactly how new features or languages behave. Language implementors can use Brooklet to show exactly under which circumstances new optimizations are correct. In ongoing work, we are developing an intermediate language for streaming that is based on Brooklet. We are implementing our intermediate language on System S, IBM's high-performance streaming middleware.

[1]  Ralf Lämmel,et al.  Google's MapReduce programming model - Revisited , 2007, Sci. Comput. Program..

[2]  Leonidas Fegaras,et al.  Optimizing Queries with Object Updates , 1999, Journal of Intelligent Information Systems.

[3]  Henry Hoffmann,et al.  MPEG-2 decoding in a stream programming language , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[4]  Michael Isard,et al.  DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.

[5]  Jennifer Widom,et al.  A denotational semantics for continuous queries over streams and relations , 2004, SGMD.

[6]  Martin C. Rinard,et al.  Commutativity analysis: a new analysis framework for parallelizing compilers , 1996, PLDI '96.

[7]  Jan Van den Bussche,et al.  A Theory of Stream Queries , 2007, DBPL.

[8]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[9]  Philip S. Yu,et al.  SPADE: the system s declarative stream processing engine , 2008, SIGMOD Conference.

[10]  Benjamin C. Pierce,et al.  Types and programming languages: the next generation , 2003, 18th Annual IEEE Symposium of Logic in Computer Science, 2003. Proceedings..

[11]  Henry Hoffmann,et al.  StreamIt: A Compiler for Streaming Applications ⁄ , 2002 .

[12]  Douglas B. Terry,et al.  Continuous queries over append-only databases , 1992, SIGMOD '92.

[13]  Flemming Nielson,et al.  Semantics with applications - a formal introduction , 1992, Wiley professional computing.

[14]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Nicola Onose,et al.  XML query optimization in the presence of side effects , 2008, SIGMOD Conference.

[17]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[18]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[19]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.

[20]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[21]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1984, TOPL.

[22]  Robert Stephens,et al.  A survey of stream processing , 1997, Acta Informatica.

[23]  Rob Pike,et al.  Interpreting the data: Parallel analysis with Sawzall , 2005, Sci. Program..

[24]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[25]  Philip Wadler,et al.  Featherweight Java: a minimal core calculus for Java and GJ , 1999, OOPSLA '99.

[26]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.