SharedDB: Killing One Thousand Queries With One Stone

Traditional database systems are built around the query-at-a-time model. This approach tries to optimize performance in a best-effort way. Unfortunately, best effort is not good enough for many modern applications. These applications require response time guarantees in high load situations. This paper describes the design of a new database architecture that is based on batching queries and shared computation across possibly hundreds of concurrent queries and updates. Performance experiments with the TPC-W benchmark show that the performance of our implementation, SharedDB, is indeed robust across a wide range of dynamic workloads.

[1]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[2]  Sheldon J. Finkelstein Common expression analysis in database applications , 1982, SIGMOD '82.

[3]  Leonard D. Shapiro,et al.  Join processing in database systems with large main memories , 1986, TODS.

[4]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[5]  Hamid Pirahesh,et al.  Extensible query processing in starburst , 1989, SIGMOD '89.

[6]  Phillip M. Fernandez Red brick warehouse: a read-mostly RDBMS for open SMP platforms , 1994, SIGMOD '94.

[7]  Divesh Srivastava,et al.  Semantic Data Caching and Replacement , 1996, VLDB.

[8]  Sven Helmer,et al.  Evaluation of Main Memory Join Algorithms for Joins with Set Comparison Join Predicates , 1996, VLDB.

[9]  Surajit Chaudhuri,et al.  An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server , 1997, VLDB.

[10]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[11]  Michael J. Franklin,et al.  Cache investment: integrating query optimization and distributed data placement , 2000, TODS.

[12]  S. Sudarshan,et al.  Pipelining in multi-query optimization , 2001, PODS '01.

[13]  Bernhard Seeger,et al.  Progressive Merge Join: A Generic and Non-blocking Sort-based Join Algorithm , 2002, VLDB.

[14]  Hao Zhang,et al.  Path sharing and predicate evaluation for high-performance XML filtering , 2003, TODS.

[15]  Anastasia Ailamaki,et al.  StagedDB: Designing Database Servers for Modern Hardware , 2005, IEEE Data Eng. Bull..

[16]  Anastasia Ailamaki,et al.  QPipe: a simultaneously pipelined relational query engine , 2005, SIGMOD '05.

[17]  Donald Kossmann,et al.  Batched processing for information filters , 2005, 21st International Conference on Data Engineering (ICDE'05).

[18]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[19]  Marcin Zukowski,et al.  Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS , 2007, VLDB.

[20]  Marcin Zukowski,et al.  Vectorized data processing on the cell broadband engine , 2007, DaMoN '07.

[21]  Frederick Reiss,et al.  Constant-Time Query Processing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[22]  Frederick Reiss,et al.  Main-memory scan sharing for multi-core CPUs , 2008, Proc. VLDB Endow..

[23]  Martin L. Kersten,et al.  An architecture for recycling intermediates in a column-store , 2009, SIGMOD Conference.

[24]  George Candea,et al.  A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses , 2009, Proc. VLDB Endow..

[25]  Pradeep Dubey,et al.  Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs , 2009, Proc. VLDB Endow..

[26]  Gustavo Alonso,et al.  Predictable Performance for Unpredictable Workloads , 2009, Proc. VLDB Endow..

[27]  Subramanian Arumugam,et al.  The DataPath system: a data-centric analytic processing engine for large data warehouses , 2010, SIGMOD Conference.

[28]  George Candea,et al.  Predictable performance and high query concurrency for data analytics , 2011, The VLDB Journal.

[29]  Gustavo Alonso,et al.  Database engines on multicores, why parallelize when you can distribute? , 2011, EuroSys '11.