Parallelizing user-defined aggregations using symbolic execution

User-defined aggregations (UDAs) are integral to large-scale data-processing systems, such as MapReduce and Hadoop, because they let programmers express application-specific aggregation logic. System-supported associative aggregations, such as counting or finding the maximum, are data-parallel and thus these systems optimize their execution, leading in many cases to orders-of-magnitude performance improvements. These optimizations, however, are not possible on arbitrary UDAs. This paper presents Symple, a system for performing MapReduce-style groupby-aggregate queries that automatically parallelizes UDAs. Users specify UDAs using stylized C++ code with possible loop-carried dependences. Symple parallelizes these UDAs by breaking dependences using symbolic execution, where unresolved dependences are treated as symbolic values and the Symple runtime partially evaluates the resulting symbolic expressions on concrete input. Programmers write UDAs using Symple's symbolic data types, which look and behave like standard C++ types. These data types (i) encode specialized decision procedures for efficient symbolic execution and (ii) generate compact symbolic expressions for efficient network transfers. Evaluation on both Amazon's Elastic cloud and a private 380-node Hadoop cluster housing terabytes of data demonstrates that Symple reduces network communication up to several orders of magnitude and job latency by as much as 5.9x for a representative set of queries.

[1]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[2]  Michael Isard,et al.  Distributed aggregation for data-parallel computing: interfaces and implementations , 2009, SOSP '09.

[3]  Manu Sridharan,et al.  Translating imperative code to MapReduce , 2014, OOPSLA 2014.

[4]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[5]  Monica S. Lam,et al.  The design, implementation, and evaluation of Jade , 1998, TOPL.

[6]  Lionello A. Lombardi LISP as the Language for an Incremental Computer , 2015 .

[7]  S StoneHarold,et al.  A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973 .

[8]  Keshav Pingali,et al.  The tao of parallelism in algorithms , 2011, PLDI '11.

[9]  Martin C. Rinard,et al.  Commutativity analysis: a new analysis technique for parallelizing compilers , 1997, TOPL.

[10]  Craig Chambers,et al.  FlumeJava: easy, efficient data-parallel pipelines , 2010, PLDI '10.

[11]  Arun Raman,et al.  Speculative parallelization using software multi-threaded transactions , 2010, ASPLOS XV.

[12]  Wei Lin,et al.  Microsoft Bing Peking University , 2022 .

[13]  Wolfram Schulte,et al.  Data-parallel finite-state machines , 2014, ASPLOS.

[14]  W. Daniel Hillis,et al.  Data parallel algorithms , 1986, CACM.

[15]  Jeffrey Overbey,et al.  A type and effect system for deterministic parallel Java , 2009, OOPSLA '09.

[16]  Guy E. Blelloch,et al.  Prefix sums and their applications , 1990 .

[17]  James C. King,et al.  Symbolic execution and program testing , 1976, CACM.

[18]  Felix Naumann,et al.  The Stratosphere platform for big data analytics , 2014, The VLDB Journal.

[19]  Scott A. Mahlke,et al.  Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory , 2009, PLDI '09.

[20]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[21]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[22]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[23]  ReedBenjamin,et al.  Building a high-level dataflow system on top of Map-Reduce , 2009, VLDB 2009.

[24]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[25]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[26]  Harold S. Stone,et al.  A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973, IEEE Transactions on Computers.

[27]  Matteo Frigo,et al.  Reducers and other Cilk++ hyperobjects , 2009, SPAA '09.

[28]  Jiaxing Zhang,et al.  Automating Distributed Partial Aggregation , 2014, SoCC.

[29]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[30]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[31]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[32]  Sanjeev Saxena,et al.  On Parallel Prefix Computation , 1994, Parallel Process. Lett..

[33]  Michael Isard,et al.  DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.

[34]  Benjamin Livshits,et al.  Data-Parallel String-Manipulating Programs , 2015, POPL.

[35]  G. Ramalingam,et al.  Safe programmable speculative parallelism , 2010, PLDI '10.

[36]  Andrey Gubarev,et al.  Dremel : Interactive Analysis of Web-Scale Datasets , 2011 .

[37]  Badrish Chandramouli,et al.  Trill: A High-Performance Incremental Query Processor for Diverse Analytics , 2014, Proc. VLDB Endow..

[38]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.