Modular data storage with Anvil

Databases have achieved orders-of-magnitude performance improvements by changing the layout of stored data -- for instance, by arranging data in columns or compressing it before storage. These improvements have been implemented in monolithic new engines, however, making it difficult to experiment with feature combinations or extensions. We present Anvil, a modular and extensible toolkit for building database back ends. Anvil's storage modules, called dTables, have much finer granularity than prior work. For example, some dTables specialize in writing data, while others provide optimized read-only formats. This specialization makes both kinds of dTable simple to write and understand. Unifying dTables implement more comprehensive functionality by layering over other dTables -- for instance, building a read/write store from read-only tables and a writable journal, or building a general-purpose store from optimized special-purpose stores. The dTable design leads to a flexible system powerful enough to implement many database storage layouts. Our prototype implementation of Anvil performs up to 5.5 times faster than an existing B-tree-based database back end on conventional workloads, and can easily be customized for further gains on specific data and workloads.

[1]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[2]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[3]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[4]  Erez Zadok,et al.  Versatility and Unix semantics in namespace unification , 2006, TOS.

[5]  RosenblumMendel,et al.  The design and implementation of a log-structured file system , 1991 .

[6]  Jason Flinn,et al.  Rethink the sync , 2006, OSDI '06.

[7]  Lei Zhang,et al.  Generalized file system dependencies , 2007, SOSP.

[8]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[9]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[10]  Peter Boncz,et al.  UvA-DARE ( Digital Academic Repository ) Monet ; a next-Generation DBMS Kernel For Query-Intensive Applications , 2007 .

[11]  Don S. Batory,et al.  GENESIS: An Extensible Database Management System , 1988, IEEE Trans. Software Eng..

[12]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[13]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[14]  Hamid Pirahesh,et al.  A data management extension architecture , 1987, SIGMOD '87.

[15]  Sven Helmer,et al.  The implementation and performance of compressed databases , 2000, SGMD.

[16]  Michael Stonebraker,et al.  The POSTGRES next generation database management system , 1991, CACM.

[17]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[18]  Alistair Moffat,et al.  Efficient online index construction for text databases , 2008, TODS.

[19]  Daniel J. Abadi,et al.  Performance tradeoffs in read-optimized databases , 2006, VLDB.

[20]  Eric A. Brewer,et al.  Stasis: flexible transactional storage , 2006, OSDI '06.

[21]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[22]  Peter M. Chen,et al.  Free transactions with Rio Vista , 1997, SOSP.

[23]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[24]  Michael Stonebraker,et al.  OLTP through the looking glass, and what we found there , 2008, SIGMOD Conference.

[25]  Eric A. Brewer,et al.  Rose: compressed, log-structured replication , 2008, Proc. VLDB Endow..