Using Functional Dependencies for Reducing the Size of a Data Cube

Functional dependencies (FD's) are a powerful concept in data organization. They have been proven very useful in e.g., relational databases for reducing data redundancy. Little work however has been done so far for using them in the context of data cubes. In the present paper, we propose to characterize the parts of a data cube to be materialized with the help of the FD's present in the underlying data. For this purpose, we consider two applications: (i) how to choose the best cuboids of a data cube to materialize in order to guarantee a fixed performance of queries and, (ii) how to choose the best tuples, hence partial cuboids, in order to reduce the size of the data cube without loosing information. In both cases we show how FD's are fundamental.

[1]  Noureddine Mouaddib,et al.  General Purpose Database Summarization , 2005, VLDB.

[2]  Robert Wrembel,et al.  New Trends in Data Warehousing and Data Analysis , 2009, New Trends in Data Warehousing and Data Analysis.

[3]  Laks V. S. Lakshmanan,et al.  Quotient Cube: How to Summarize the Semantics of a Data Cube , 2002, VLDB.

[4]  Shuai Ma,et al.  Extending Dependencies with Conditions , 2007, VLDB.

[5]  Peter C. Lockemann,et al.  Advances in Database Technology — EDBT 2000 , 2000, Lecture Notes in Computer Science.

[6]  Nicolas Hanusse,et al.  A view selection algorithm with performance guarantee , 2009, EDBT '09.

[7]  Robin Milner,et al.  On Observing Nondeterminism and Concurrency , 1980, ICALP.

[8]  Renée J. Miller,et al.  Discovering data quality rules , 2008, Proc. VLDB Endow..

[9]  E. F. Codd,et al.  Normalized data base structure: a brief tutorial , 1971, SIGFIDET '71.

[10]  Paul De Bra,et al.  Conditional Dependencies for Horizontal Decompositions , 1983, ICALP.

[11]  Rosine Cicchetti,et al.  FUN: An Efficient Algorithm for Mining Functional and Embedded Dependencies , 2001, ICDT.

[12]  Nicolas Bruno,et al.  Automated Physical Database Design and Tuning , 2011, Emerging directions in database systems and applications.

[13]  Hongjun Lu,et al.  Condensed cube: an effective approach to reducing data cube size , 2002, Proceedings 18th International Conference on Data Engineering.

[14]  Frank Wm. Tompa,et al.  Efficiently updating materialized views , 1986, SIGMOD '86.

[15]  Lotfi Lakhal,et al.  Closed Cube Lattices , 2009, New Trends in Data Warehousing and Data Analysis.

[16]  Catriel Beeri,et al.  A Sophisticate's Introduction to Database Normalization Theory , 1978, VLDB.

[17]  Peter Thanisch,et al.  Normalising OLAP cubes for controlling sparsity , 2003, Data Knowl. Eng..

[18]  Nicolas Hanusse,et al.  Revisiting the Partial Data Cube Materialization , 2011, ADBIS.

[19]  Wolfgang Lehner,et al.  On solving the view selection problem in distributed data warehouse architectures , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[20]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[21]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[22]  Rada Chirkova,et al.  A Formal Model for the Problem of View Selection for Aggregate Queries , 2005, ADBIS.

[23]  Lotfi Lakhal,et al.  Extracting semantics from data cubes using cube transversals and closures , 2003, KDD '03.

[24]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[25]  Jeffrey F. Naughton,et al.  Materialized View Selection for Multidimensional Datasets , 1998, VLDB.

[26]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[27]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[28]  SrivastavaDivesh,et al.  On generating near-optimal tableaux for conditional functional dependencies , 2008, VLDB 2008.

[29]  Howard J. Karloff,et al.  On the complexity of the view-selection problem , 1999, PODS '99.

[30]  Surajit Chaudhuri,et al.  Variance aware optimization of parameterized queries , 2010, SIGMOD Conference.

[31]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[32]  Bei Yu,et al.  On generating near-optimal tableaux for conditional functional dependencies , 2008, Proc. VLDB Endow..

[33]  Hongyan Liu,et al.  C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[34]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[35]  Jeffrey F. Naughton,et al.  Materialized View Selection for Multi-Cube Data Models , 2000, EDBT.

[36]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[37]  Heikki Mannila,et al.  Design of Relational Databases , 1992 .

[38]  Elena Baralis,et al.  Materialized Views Selection in a Multidimensional Database , 1997, VLDB.