论文信息 - Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data analysis applications typically aggregate data across manydimensions looking for anomalies or unusual patterns. The SQL aggregatefunctions and the GROUP BY operator produce zero-dimensional orone-dimensional aggregates. Applications need the N-dimensionalgeneralization of these operators. This paper defines that operator, calledthe data cube or simply cube. The cube operator generalizes the histogram,cross-tabulation, roll-up,drill-down, and sub-total constructs found in most report writers.The novelty is that cubes are relations. Consequently, the cubeoperator can be imbedded in more complex non-procedural dataanalysis programs. The cube operator treats each of the Naggregation attributes as a dimension of N-space. The aggregate ofa particular set of attribute values is a point in this space. Theset of points forms an N-dimensional cube. Super-aggregates arecomputed by aggregating the N-cube to lower dimensional spaces.This paper (1) explains the cube and roll-up operators, (2) showshow they fit in SQL, (3) explains how users can define new aggregatefunctions for cubes, and (4) discusses efficient techniques tocompute the cube. Many of these features are being added to the SQLStandard.

[1] C. J. Date. An Introduction to Database Systems , 1975 .

[2] Roderic G. G. Cattell. The benchmark handbook for database and transaction processing systems , 1991 .

[3] Forouzan Golshani,et al. Proceedings of the Eighth International Conference on Data Engineering , 1992 .

[4] Jim Gray,et al. Benchmark Handbook: For Database and Transaction Processing Systems , 1992 .

[5] Goetz Graefe,et al. Query evaluation techniques for large databases , 1993, CSUR.

[6] Alan R. Simon,et al. Understanding the New SQL: A Complete Guide , 1993 .

[7] Jeffrey F. Naughton,et al. On the Computation of Multidimensional Aggregates , 1996, VLDB.

[8] Jeffrey F. Naughton,et al. Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies , 1996, VLDB.

[9] Donald D. Chamberlin,et al. Using the New DB2: IBM's Object-Relational Database System , 1996 .

[10] Jeffrey D. Ullman,et al. Implementing data cubes efficiently , 1996, SIGMOD '96.

[11] Stéphane Bressan,et al. Introduction to Database Systems , 2005 .

[12] Acknowledgments , 2006, Molecular and Cellular Endocrinology.