论文信息 - C-Store: A Column-oriented DBMS

C-Store: A Column-oriented DBMS

This paper presents the design of a read-optimized relational DBMS that contrasts sharply with most current systems, which are write-optimized. Among the many differences in its design are: storage of data by column rather than by row, careful coding and packing of objects into storage including main memory during query processing, storing an overlapping collection of column-oriented projections, rather than the current fare of tables and indexes, a non-traditional implementation of transactions which includes high availability and snapshot isolation for read-only transactions, and the extensive use of bitmap indexes to complement B-tree structures.We present preliminary performance data on a subset of TPC-H and show that the system we are building, C-Store, is substantially faster than popular commercial products. Hence, the architecture looks very encouraging.

[1] Donald D. Chamberlin,et al. SEQUEL: A structured English query language , 1974, SIGFIDET '74.

[2] Stephen N. Zilles,et al. Programming with abstract data types , 1974 .

[3] V. Kevin M. Whitney,et al. Relational data management implementation techniques , 1974, SIGFIDET '74.

[4] E. F. Codd,et al. The relational and network approaches: Comparison of the application programming interfaces , 1975, SIGFIDET '74.

[5] Eugene Wong,et al. Decomposition—a strategy for query processing , 1976, TODS.

[6] Irving L. Traiger,et al. The notions of consistency and predicate locks in a database system , 1976, CACM.

[7] Irving L. Traiger,et al. System R: relational approach to database management , 1976, TODS.

[8] Donald D. Chamberlin,et al. SEQUEL 2: A Unified Approach to Data Definition, Manipulation, and Control , 1976, IBM J. Res. Dev..

[9] D. J. De Witt,et al. Direct—A Multiprocessor Organization for Supporting Relational Database Management Systems , 1979 .

[10] David J. DeWitt,et al. Query execution in DIRECT , 1979, SIGMOD '79.

[11] Patricia G. Selinger,et al. Access path selection in a relational database management system , 1979, SIGMOD '79.

[12] David J. DeWitt,et al. Benchmarking Database Systems A Systematic Approach , 1983, VLDB.

[13] E. F. Codd,et al. A relational model of data for large shared data banks , 1970, CACM.

[14] David Maier,et al. Making smalltalk a database system , 1984, SIGMOD '84.

[15] Antonin Guttman,et al. R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[16] Michael Stonebraker,et al. The Case for Shared Nothing , 1985, HPTS.

[17] Reind P. van de Riet,et al. Expert database systems , 1986, Future Gener. Comput. Syst..

[18] David J. DeWitt,et al. GAMMA - A High Performance Dataflow Database Machine , 1986, VLDB.

[19] Andreas Reuter,et al. Tandem Database Group - NonStop SQL: A Distributed, High-Performance, High-Availability Implementation of SQL , 1987, HPTS.

[20] Bruce G. Lindsay,et al. A retrospective of R*: A distributed database management system , 1987, Proceedings of the IEEE.

[21] Ian H. Witten,et al. Arithmetic coding for data compression , 1987, CACM.

[22] Tom W. Keller,et al. Data placement in Bubba , 1988, SIGMOD '88.

[23] A Robbin,et al. Creating SIPP longitudinal analysis files using a relational database management system. , 1988 .

[24] Hamid Pirahesh,et al. Extensible query processing in starburst , 1989, SIGMOD '89.

[25] Donald D. Chamberlin,et al. Access Path Selection in a Relational Database Management System , 1989 .

[26] Donovan A. Schneider,et al. The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[27] Jennifer Widom,et al. Deriving Production Rules for Incremental View Maintenance , 1991, VLDB.

[28] Goetz Graefe,et al. Data compression and database performance , 1991, [Proceedings] 1991 Symposium on Applied Computing.

[29] Andreas Reuter,et al. Transaction Processing: Concepts and Techniques , 1992 .

[30] Wei Hong,et al. Exploiting inter-operation parallelism in XPRS , 1992, SIGMOD '92.

[31] David J. DeWitt,et al. Parallel database systems: the future of high performance database systems , 1992, CACM.

[32] David J. DeWitt,et al. Parallel Database Systems: The Future of High Performance Database Processing 1 , 1992 .

[33] Hamid Pirahesh,et al. ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[34] Michael A. Olson,et al. The Design and Implementation of the Inversion File System , 1993, USENIX Winter.

[35] Mark A. Roth,et al. Database compression , 1993, SGMD.

[36] Goetz Graefe,et al. Query evaluation techniques for large databases , 1993, CSUR.

[37] David J. DeWitt,et al. Shoring up persistent applications , 1994, SIGMOD '94.

[38] Clark D. French,et al. “One size fits all” database architectures do not work for DSS , 1995, SIGMOD '95.

[39] Jim Gray,et al. A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[40] Jeffrey F. Naughton,et al. Generalized Search Trees for Database Systems , 1995, VLDB.

[41] Laura M. Haas,et al. Towards heterogeneous multimedia information systems: the Garlic approach , 1995, Proceedings RIDE-DOM'95. Fifth International Workshop on Research Issues in Data Engineering-Distributed Object Management.

[42] Matthias Jarke,et al. Incremental Maintenance of Externally Materialized Views , 1996, VLDB.

[43] U. M. Feyyad. Data mining and knowledge discovery: making sense out of data , 1996 .

[44] Patrick E. O'Neil,et al. Improved query performance with variant indexes , 1997, SIGMOD '97.

[45] C. Mohan,et al. DB2's Use of the Coupling Facility for Data Sharing , 1997, IBM Syst. J..

[46] Philip S. Yu,et al. Cluster Architectures and S/390 Parallel Sysplex Scalability , 1997, IBM Syst. J..

[47] Clayton M. Christensen. The Innovator's Dilemma: When New Technologies Cause Great Firms to Fail , 2013 .

[48] C. Mohan,et al. Concurrency and recovery in generalized search trees , 1997, SIGMOD '97.

[49] Jeffrey F. Naughton,et al. An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[50] Nick Roussopoulos,et al. DynaMat: a dynamic view management system for data warehouses , 1999, SIGMOD '99.

[51] Sven Helmer,et al. The implementation and performance of compressed databases , 2000, SGMD.

[52] Paul Westerman. Data Warehousing: Using the Wal-Mart Model , 2000 .

[53] David J. DeWitt,et al. NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD 2000.