Skycube Materialization Using the Topmost Skyline or Functional Dependencies

Given a table T(Id, D1, …, Dd), the skycube of T is the set of skylines with respect to to all nonempty subsets (subspaces) of the set of all dimensions {D1, …, Dd}. To optimize the evaluation of any skyline query, the solutions proposed so far in the literature either (i) precompute all of the skylines or (ii) use compression techniques so that the derivation of any skyline can be done with little effort. Even though solutions (i) are appealing because skyline queries have optimal execution time, they suffer from time and space scalability because the number of skylines to be materialized is exponential with respect to d. On the other hand, solutions (ii) are attractive in terms of memory consumption, but as we show, they also have a high time complexity. In this article, we make contributions to both kinds of solutions. We first observe that skyline patterns are monotonic. This property leads to a simple yet efficient solution for full and partial skycube materialization when the skyline with respect to all dimensions, the topmost skyline, is small. On the other hand, when the topmost skyline is large relative to the size of the input table, it turns out that functional dependencies, a fundamental concept in databases, uncover a monotonic property between skylines. Equipped with this information, we show that closed attributes sets are fundamental for partial and full skycube materialization. Extensive experiments with real and synthetic datasets show that our solutions generally outperform state-of-the-art algorithms.

[1]  Carlos Ordonez,et al.  Statistical Model Computation with UDFs , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[3]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[4]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[5]  Calisto Zuzarte,et al.  Exploiting constraint-like data characterizations in query optimization , 2001, SIGMOD '01.

[6]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[7]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[8]  Howard J. Hamilton,et al.  Mining functional dependencies from data , 2007, Data Mining and Knowledge Discovery.

[9]  Georg Gottlob,et al.  Identifying the Minimal Transversals of a Hypergraph and Related Problems , 1995, SIAM J. Comput..

[10]  Heikki Mannila,et al.  Design of Relational Databases , 1992 .

[11]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[12]  Surajit Chaudhuri,et al.  Robust Cardinality and Cost Estimation for Skyline Operator , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  Owen Kaser,et al.  Reordering rows for better compression: Beyond the lexicographic order , 2012, TODS.

[14]  Nicolas Hanusse,et al.  A view selection algorithm with performance guarantee , 2009, EDBT '09.

[15]  Nicolas Hanusse,et al.  A parallel algorithm for computing borders , 2011, CIKM '11.

[16]  Jean-Marc Petit,et al.  Efficient Discovery of Functional Dependencies and Armstrong Relations , 2000, EDBT.

[17]  John Grant,et al.  Logic-based approach to semantic query optimization , 1990, TODS.

[18]  Jignesh M. Patel,et al.  Efficient Skyline Computation over Low-Cardinality Domains , 2007, VLDB.

[19]  Jarek Gryz,et al.  Algorithms and analyses for maximal vector computation , 2007, The VLDB Journal.

[20]  Masaru Kitsuregawa,et al.  Skyline Operator on Anti-correlated Distributions , 2013, Proc. VLDB Endow..

[21]  Ilaria Bartolini,et al.  Efficient sort-based skyline evaluation , 2008, TODS.

[22]  Sofian Maabout,et al.  Using Functional Dependencies for Reducing the Size of a Data Cube , 2012, FoIKS.

[23]  Jian Pei,et al.  Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces , 2005, VLDB.

[24]  Seung-won Hwang,et al.  BSkyTree: scalable skyline computation using a balanced pivot selection , 2010, EDBT '10.

[25]  Jeffrey F. Naughton,et al.  An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[26]  Qing Liu,et al.  Efficient Computation of the Skyline Cube , 2005, VLDB.

[27]  Ashwin Lall,et al.  Randomized Multi-pass Streaming Skyline Algorithms , 2009, Proc. VLDB Endow..

[28]  Yufei Tao,et al.  Worst-Case I/O-Efficient Skyline Algorithms , 2012, TODS.

[29]  David Maier,et al.  The Theory of Relational Databases , 1983 .

[30]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[31]  Jie Wang,et al.  Online subspace skyline query processing using the compressed skycube , 2012, TODS.

[32]  Rosine Cicchetti,et al.  FUN: An Efficient Algorithm for Mining Functional and Embedded Dependencies , 2001, ICDT.

[33]  Chedy Raïssi,et al.  Computing closed skycubes , 2010, Proc. VLDB Endow..

[34]  Heikki Mannila,et al.  Algorithms for Inferring Functional Dependencies from Relations , 1994, Data Knowl. Eng..

[35]  Qing Liu,et al.  Towards multidimensional subspace skyline analysis , 2006, TODS.

[36]  Wellington Cabrera,et al.  Comparing columnar, row and array DBMSs to process recursive queries on graphs , 2017, Inf. Syst..

[37]  Sean Chester,et al.  Hashcube: A Data Structure for Space- and Query-Efficient Skycube Compression , 2014, CIKM.

[38]  Rada Chirkova,et al.  A Formal Model for the Problem of View Selection for Aggregate Queries , 2005, ADBIS.

[39]  Jarek Gryz,et al.  Fundamentals of Order Dependencies , 2012, Proc. VLDB Endow..

[40]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[41]  Seung-won Hwang,et al.  Toward efficient multidimensional subspace skyline computation , 2013, The VLDB Journal.

[42]  Parke Godfrey,et al.  Skyline Cardinality for Relational Processing , 2004, FoIKS.