DiffStream: differential output testing for stream processing programs

High performance architectures for processing distributed data streams, such as Flink, Spark Streaming, and Storm, are increasingly deployed in emerging data-driven computing systems. Exploiting the parallelism afforded by such platforms, while preserving the semantics of the desired computation, is prone to errors, and motivates the development of tools for specification, testing, and verification. We focus on the problem of differential output testing for distributed stream processing systems, that is, checking whether two implementations produce equivalent output streams in response to a given input stream. The notion of equivalence allows reordering of logically independent data items, and the main technical contribution of the paper is an optimal online algorithm for checking this equivalence. Our testing framework is implemented as a library called DiffStream in Flink. We present four case studies to illustrate how our framework can be used to (1) correctly identify bugs in a set of benchmark MapReduce programs, (2) facilitate the development of difficult-to-parallelize high performance applications, and (3) monitor an application for a long period of time with minimal performance overhead.

[1]  Gregg Rothermel,et al.  Semantic characterization of MapReduce workloads , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[2]  Yuanyuan Zhou,et al.  Efficient online validation with delta execution , 2009, ASPLOS.

[3]  Christopher Olston,et al.  Generating example data for dataflow programs , 2009, SIGMOD Conference.

[4]  Barton P. Miller,et al.  On the Complexity of Event Ordering for Shared-Memory Parallel Program Executions , 1990, ICPP.

[5]  PanchekhaPavel,et al.  Verdi: a framework for implementing and formally verifying distributed systems , 2015 .

[6]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[7]  SagivMooly,et al.  Ivy: safety verification by interactive generalization , 2016 .

[8]  Koushik Sen,et al.  Efficient data race detection for distributed memory parallel programs , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[9]  Parosh Aziz Abdulla,et al.  Optimal dynamic partial order reduction , 2014, POPL.

[10]  Wenguang Chen,et al.  Nondeterminism in MapReduce considered harmful? an empirical study on non-commutative aggregators in MapReduce programs , 2014, ICSE Companion.

[11]  Gregg Rothermel,et al.  Testing properties of dataflow program operators , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[12]  Wei Lin,et al.  A characteristic study on failures of production distributed data-parallel programs , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[13]  Doron A. Peled,et al.  Combining partial order reductions with on-the-fly model-checking , 1994, Formal Methods Syst. Des..

[14]  Jignesh M. Patel,et al.  Twitter Heron: Stream Processing at Scale , 2015, SIGMOD Conference.

[15]  SenKoushik Race directed random testing of concurrent programs , 2008 .

[16]  Indranil Gupta,et al.  Stateful Scalable Stream Processing at LinkedIn , 2017, Proc. VLDB Endow..

[17]  Patrice Godefroid,et al.  Partial-Order Methods for the Verification of Concurrent Systems , 1996, Lecture Notes in Computer Science.

[18]  Badrish Chandramouli,et al.  An extensible test framework for the Microsoft StreamInsight query processor , 2010, DBTest '10.

[19]  Koushik Sen,et al.  Race directed random testing of concurrent programs , 2008, PLDI '08.

[20]  Kiev Gama,et al.  An Exploratory Study of How Specialists Deal with Testing in Data Stream Processing Applications , 2019, 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[21]  David Brumley,et al.  Tachyon: Tandem Execution for Efficient Live Patch Testing , 2012, USENIX Security Symposium.

[22]  Rupak Majumdar,et al.  Randomized testing of distributed systems with probabilistic guarantees , 2018, Proc. ACM Program. Lang..

[23]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[24]  Haoxiang Lin,et al.  An Empirical Study on Quality Issues of Production Big Data Platform , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[25]  Gavin Lowe,et al.  Testing for linearizability , 2017, Concurr. Comput. Pract. Exp..

[26]  Alberto Savoia,et al.  Differential testing: a new approach to change detection , 2007, ESEC-FSE '07.

[27]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multithreaded programs , 1997, TOCS.

[28]  Sebastian Burckhardt,et al.  Line-up: a complete and automatic linearizability checker , 2010, PLDI '10.

[29]  Barton P. Miller,et al.  What are race conditions?: Some issues and formalizations , 1992, LOPL.

[30]  Phillip B. Gibbons,et al.  Testing Shared Memories , 1997, SIAM J. Comput..

[31]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[32]  Luca P. Carloni,et al.  Flexible filters: load balancing through backpressure for stream programs , 2009, EMSOFT '09.

[33]  Sebastian Burckhardt,et al.  Replicated data types: specification, verification, optimality , 2014, POPL.

[34]  Lei Song,et al.  The Commutativity Problem of the MapReduce Framework: A Transducer-Based Approach , 2016, CAV.

[35]  Chengkai Li,et al.  New ideas track: testing mapreduce-style programs , 2011, ESEC/FSE '11.

[36]  Kun-Lung Wu,et al.  GOVERNOR: Smoother Stream Processing Through Smarter Backpressure , 2017, 2017 IEEE International Conference on Autonomic Computing (ICAC).

[37]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[38]  Christopher Olston,et al.  Inspector gadget: a framework for custom monitoring and debugging of distributed dataflows , 2011, SIGMOD '11.

[39]  Wojciech Zielonka,et al.  The Book of Traces , 1995 .

[40]  Claudio de la Riva,et al.  Testing MapReduce programs: A systematic mapping study , 2018, J. Softw. Evol. Process..

[41]  Bianca Schroeder,et al.  A Large-Scale Study of Failures in High-Performance Computing Systems , 2006, IEEE Transactions on Dependable and Secure Computing.

[42]  Kun-Lung Wu,et al.  Safe Data Parallelism for General Streaming , 2015, IEEE Transactions on Computers.

[43]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[44]  Alex Groce,et al.  Randomized Differential Testing as a Prelude to Formal Verification , 2007, 29th International Conference on Software Engineering (ICSE'07).

[45]  Martin Leucker,et al.  A brief account of runtime verification , 2009, J. Log. Algebraic Methods Program..

[46]  Rajeev Alur,et al.  Data-trace types for distributed stream processing systems , 2019, PLDI.

[47]  Rajeev Gandhi,et al.  An Analysis of Traces from a Production MapReduce Cluster , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[48]  Kenneth L. McMillan,et al.  Ivy: safety verification by interactive generalization , 2016, PLDI.

[49]  Zhuo Liu,et al.  Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[50]  JonssonBengt,et al.  Optimal dynamic partial order reduction , 2014 .

[51]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[52]  Grigore Rosu,et al.  Efficient monitoring of safety properties , 2004, International Journal on Software Tools for Technology Transfer.

[53]  Miryung Kim,et al.  BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[54]  Srinath T. V. Setty,et al.  IronFleet: proving practical distributed systems correct , 2015, SOSP.

[55]  Cristian Cadar,et al.  Safe software updates via multi-version execution , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[56]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[57]  Jiaxing Zhang,et al.  Automating Distributed Partial Aggregation , 2014, SoCC.

[58]  Anca Muscholl,et al.  Trace Theory , 2011, Encyclopedia of Parallel Computing.

[59]  W. M. McKeeman,et al.  Differential Testing for Software , 1998, Digit. Tech. J..

[60]  Yanhong A. Liu,et al.  Formal Verification of Multi-Paxos for Distributed Consensus , 2016, FM.

[61]  Jeannette M. Wing,et al.  Testing and Verifying Concurrent Objects , 1993, J. Parallel Distributed Comput..

[62]  Todd Mytkowicz,et al.  Parallelizing user-defined aggregations using symbolic execution , 2015, SOSP.

[63]  Phillip B. Gibbons,et al.  The complexity of sequential consistency , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[64]  João Eugenio Marynowski,et al.  Testing MapReduce based systems , 2011, SBBD.

[65]  Xi Wang,et al.  Verdi: a framework for implementing and formally verifying distributed systems , 2015, PLDI.