Testing properties of dataflow program operators

Dataflow programming languages, which represent programs as graphs of data streams and operators, are becoming increasingly popular and being used to create a wide array of commercial software applications. The dependability of programs written in these languages, as well as the systems used to compile and run these programs, hinges on the correctness of the semantic properties associated with operators. Unfortunately, these properties are often poorly defined, and frequently are not checked, and this can lead to a wide range of problems in the programs that use the operators. In this paper we present an approach for improving the dependability of dataflow programs by checking operators for necessary properties. Our approach is dynamic, and involves generating tests whose results are checked to determine whether specific properties hold or not. We present empirical data that shows that our approach is both effective and efficient at assessing the status of properties.

[1]  Christopher Olston,et al.  Generating example data for dataflow programs , 2009, SIGMOD Conference.

[2]  Astrid Rheinländer,et al.  Opening the Black Boxes in Data Flow Optimization , 2012, Proc. VLDB Endow..

[3]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[4]  Eric Bouillet,et al.  Extending a general-purpose streaming system for XML , 2012, EDBT '12.

[5]  William Thies,et al.  An empirical characterization of stream programs and its implications for language and compiler design , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Martin C. Rinard,et al.  Commutativity analysis: a new analysis framework for parallelizing compilers , 1996, PLDI '96.

[7]  Robert Stephens,et al.  A survey of stream processing , 1997, Acta Informatica.

[8]  Martin Hirzel,et al.  Partition and compose: parallel complex event processing , 2012, DEBS.

[9]  Gregg Rothermel,et al.  Semantic characterization of MapReduce workloads , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[10]  D.M. Cohen,et al.  The Combinatorial Design Approach to Automatic Test Generation , 1996, IEEE Softw..

[11]  Kun-Lung Wu,et al.  IBM Streams Processing Language: Analyzing Big Data in motion , 2013, IBM J. Res. Dev..

[12]  Easwaran Raman,et al.  Parallel-stage decoupled software pipelining , 2008, CGO '08.

[13]  Alex Groce,et al.  Swarm testing , 2012, ISSTA 2012.

[14]  Peng Li,et al.  Deadlock avoidance for streaming computations with filtering , 2010, SPAA '10.

[15]  Robert Grimm,et al.  A Universal Calculus for Stream Processing Languages , 2010, ESOP.

[16]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[17]  Patrick Th. Eugster,et al.  Program analysis for event-based distributed systems , 2011, DEBS '11.

[18]  Simeon C. Ntafos,et al.  A report on random testing , 1981, ICSE '81.

[19]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[20]  Kun-Lung Wu,et al.  Auto-parallelizing stateful distributed streaming applications , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[21]  Robert Grimm,et al.  From a calculus to an execution environment for stream processing , 2012, DEBS.

[22]  Kumar Chellapilla,et al.  Combining mutation operators in evolutionary programming , 1998, IEEE Trans. Evol. Comput..

[23]  Craig Chambers,et al.  FlumeJava: easy, efficient data-parallel pipelines , 2010, PLDI '10.

[24]  Michael Isard,et al.  Distributed aggregation for data-parallel computing: interfaces and implementations , 2009, SOSP '09.