C-Store: A Column-oriented DBMS

This paper presents the design of a read-optimized relational DBMS that contrasts sharply with most current systems, which are write-optimized. Among the many differences in its design are: storage of data by column rather than by row, careful coding and packing of objects into storage including main memory during query processing, storing an overlapping collection of column-oriented projections, rather than the current fare of tables and indexes, a non-traditional implementation of transactions which includes high availability and snapshot isolation for read-only transactions, and the extensive use of bitmap indexes to complement B-tree structures.We present preliminary performance data on a subset of TPC-H and show that the system we are building, C-Store, is substantially faster than popular commercial products. Hence, the architecture looks very encouraging.

[1]  Donald D. Chamberlin,et al.  SEQUEL: A structured English query language , 1974, SIGFIDET '74.

[2]  Stephen N. Zilles,et al.  Programming with abstract data types , 1974 .

[3]  V. Kevin M. Whitney,et al.  Relational data management implementation techniques , 1974, SIGFIDET '74.

[4]  E. F. Codd,et al.  The relational and network approaches: Comparison of the application programming interfaces , 1975, SIGFIDET '74.

[5]  Eugene Wong,et al.  Decomposition—a strategy for query processing , 1976, TODS.

[6]  Irving L. Traiger,et al.  The notions of consistency and predicate locks in a database system , 1976, CACM.

[7]  Irving L. Traiger,et al.  System R: relational approach to database management , 1976, TODS.

[8]  Donald D. Chamberlin,et al.  SEQUEL 2: A Unified Approach to Data Definition, Manipulation, and Control , 1976, IBM J. Res. Dev..

[9]  D. J. De Witt,et al.  Direct—A Multiprocessor Organization for Supporting Relational Database Management Systems , 1979 .

[10]  David J. DeWitt,et al.  Query execution in DIRECT , 1979, SIGMOD '79.

[11]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[12]  David J. DeWitt,et al.  Benchmarking Database Systems A Systematic Approach , 1983, VLDB.

[13]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[14]  David Maier,et al.  Making smalltalk a database system , 1984, SIGMOD '84.

[15]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[16]  Michael Stonebraker,et al.  The Case for Shared Nothing , 1985, HPTS.

[17]  Reind P. van de Riet,et al.  Expert database systems , 1986, Future Gener. Comput. Syst..

[18]  David J. DeWitt,et al.  GAMMA - A High Performance Dataflow Database Machine , 1986, VLDB.

[19]  Andreas Reuter,et al.  Tandem Database Group - NonStop SQL: A Distributed, High-Performance, High-Availability Implementation of SQL , 1987, HPTS.

[20]  Bruce G. Lindsay,et al.  A retrospective of R*: A distributed database management system , 1987, Proceedings of the IEEE.

[21]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[22]  Tom W. Keller,et al.  Data placement in Bubba , 1988, SIGMOD '88.

[23]  A Robbin,et al.  Creating SIPP longitudinal analysis files using a relational database management system. , 1988 .

[24]  Hamid Pirahesh,et al.  Extensible query processing in starburst , 1989, SIGMOD '89.

[25]  Donald D. Chamberlin,et al.  Access Path Selection in a Relational Database Management System , 1989 .

[26]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[27]  Jennifer Widom,et al.  Deriving Production Rules for Incremental View Maintenance , 1991, VLDB.

[28]  Goetz Graefe,et al.  Data compression and database performance , 1991, [Proceedings] 1991 Symposium on Applied Computing.

[29]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[30]  Wei Hong,et al.  Exploiting inter-operation parallelism in XPRS , 1992, SIGMOD '92.

[31]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[32]  David J. DeWitt,et al.  Parallel Database Systems: The Future of High Performance Database Processing 1 , 1992 .

[33]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[34]  Michael A. Olson,et al.  The Design and Implementation of the Inversion File System , 1993, USENIX Winter.

[35]  Mark A. Roth,et al.  Database compression , 1993, SGMD.

[36]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[37]  David J. DeWitt,et al.  Shoring up persistent applications , 1994, SIGMOD '94.

[38]  Clark D. French,et al.  “One size fits all” database architectures do not work for DSS , 1995, SIGMOD '95.

[39]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[40]  Jeffrey F. Naughton,et al.  Generalized Search Trees for Database Systems , 1995, VLDB.

[41]  Laura M. Haas,et al.  Towards heterogeneous multimedia information systems: the Garlic approach , 1995, Proceedings RIDE-DOM'95. Fifth International Workshop on Research Issues in Data Engineering-Distributed Object Management.

[42]  Matthias Jarke,et al.  Incremental Maintenance of Externally Materialized Views , 1996, VLDB.

[43]  U. M. Feyyad Data mining and knowledge discovery: making sense out of data , 1996 .

[44]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[45]  C. Mohan,et al.  DB2's Use of the Coupling Facility for Data Sharing , 1997, IBM Syst. J..

[46]  Philip S. Yu,et al.  Cluster Architectures and S/390 Parallel Sysplex Scalability , 1997, IBM Syst. J..

[47]  Clayton M. Christensen The Innovator's Dilemma: When New Technologies Cause Great Firms to Fail , 2013 .

[48]  C. Mohan,et al.  Concurrency and recovery in generalized search trees , 1997, SIGMOD '97.

[49]  Jeffrey F. Naughton,et al.  An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[50]  Nick Roussopoulos,et al.  DynaMat: a dynamic view management system for data warehouses , 1999, SIGMOD '99.

[51]  Sven Helmer,et al.  The implementation and performance of compressed databases , 2000, SGMD.

[52]  Paul Westerman Data Warehousing: Using the Wal-Mart Model , 2000 .

[53]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD 2000.

[54]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[55]  Daniel P. Miranker,et al.  On a model of indexability and its bounds for range queries , 2002, JACM.

[56]  Laura M. Haas,et al.  Garlic: a new flavor of federated query processing for DB2 , 2002, SIGMOD '02.

[57]  D. DeWitt,et al.  A case for fractured mirrors , 2003, The VLDB journal.

[58]  R. Motwani,et al.  Query Processing, Approximation, and Resource Management in a Data Stream Management System , 2003, CIDR.

[59]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[60]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[61]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[62]  Anastasia Ailamaki,et al.  AutoPart: automating schema design for large scientific databases using data partitioning , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[63]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[64]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[65]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[66]  Stanley B. Zdonik,et al.  Window-aware load shedding for aggregation queries over data streams , 2006, VLDB.

[67]  Stanley B. Zdonik,et al.  Revision Processing in a Stream Processing Engine: A High-Level Design , 2006, 22nd International Conference on Data Engineering (ICDE'06).