Towards unified ad-hoc data processing

It is important to provide efficient execution for ad-hoc data processing programs. In contrast to constructing complex declarative queries, many users prefer to write their programs using procedural code with simple queries. As many users are not expert programmers, their programs usually exhibit poor performance in practice and it is a challenge to automatically optimize these programs and efficiently execute the programs. In this paper, we present UniAD, a system designed to simplify the programming of data processing tasks and provide efficient execution for user programs. We propose a novel intermediate representation named UniQL which utilizes HOQs to describe the operations performed in programs. By combining both procedural and declarative logics, we can perform various optimizations across the boundary between procedural and declarative codes. We describe optimizations and conduct extensive empirical studies using UniAD. The experimental results on four benchmarks demonstrate that our techniques can significantly improve the performance of a wide range of data processing programs.

[1]  Peter Buneman,et al.  Structural Recursion as a Query Language , 1992, DBPL.

[2]  S. Sudarshan,et al.  Holistic optimization by prefetching query results , 2012, SIGMOD Conference.

[3]  Jeffrey Xu Yu,et al.  Relational Approach for Shortest Path Discovery over Large Graphs , 2011, Proc. VLDB Endow..

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[6]  Carlo Curino,et al.  OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases , 2013, Proc. VLDB Endow..

[7]  Michael Isard,et al.  DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.

[8]  Toby Segaran,et al.  Programming Collective Intelligence , 2007 .

[9]  Alvin Cheung,et al.  Optimizing database-backed applications with query synthesis , 2013, PLDI.

[10]  S. Sudarshan,et al.  Rewriting procedures for batched bindings , 2008, Proc. VLDB Endow..

[11]  William R. Cook,et al.  Extracting queries by static analysis of transparent persistence , 2007, POPL '07.

[12]  David Maier,et al.  Representing Database Programs as Objects , 1990, DBPL.

[13]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[14]  Sophie Cluet,et al.  A general framework for the optimization of object-oriented queries , 1992, SIGMOD '92.

[15]  Thomas Neumann,et al.  Efficiently Compiling Efficient Query Plans for Modern Hardware , 2011, Proc. VLDB Endow..

[16]  David Maier,et al.  Query Optimization in Object-Oriented Database Systems: A Prospectus , 1988, OODBS.

[17]  Philip W. Trinder,et al.  Comprehensions, a Query Notation for DBPLs , 1992, DBPL.

[18]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[19]  Hamid Pirahesh,et al.  Compiled Query Execution Engine using JVM , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[20]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[21]  Alvin Cheung,et al.  Automatic Partitioning of Database Applications , 2012, Proc. VLDB Endow..

[22]  Christopher Ré,et al.  Automatic Optimization for MapReduce Programs , 2011, Proc. VLDB Endow..

[23]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[24]  David Maier,et al.  Optimizing object queries using an effective calculus , 2000, TODS.

[25]  Stratis Viglas,et al.  Generating code for holistic query evaluation , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[26]  Stratis Viglas Just-in-time compilation for SQL query processing , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[27]  Willy Zwaenepoel,et al.  HadoopToSQL: a mapReduce query optimizer , 2010, EuroSys '10.

[28]  S. Sudarshan,et al.  Program transformations for asynchronous query submission , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[29]  Ralf Lämmel,et al.  Google's MapReduce programming model - Revisited , 2007, Sci. Comput. Program..

[30]  David J. DeWitt,et al.  Optimizing Loops in Database Programming Languages , 1991, DBPL.

[31]  William R. Cook,et al.  Interprocedural query extraction for transparent persistence , 2008, OOPSLA.

[32]  Kenneth A. Ross,et al.  Automatic contention detection and amelioration for data-intensive operations , 2010, SIGMOD Conference.