Effectively and efficiently supporting roll-up and drill-down OLAP operations over continuous dimensions via hierarchical clustering

In traditional OLAP systems, roll-up and drill-down operations over data cubes exploit fixed hierarchies defined on discrete attributes, which play the roles of dimensions, and operate along them. New emerging application scenarios, such as sensor networks, have stimulated research on OLAP systems, where even continuous attributes are considered as dimensions of analysis, and hierarchies are defined over continuous domains. The goal is to avoid the prior definition of an ad-hoc discretization hierarchy along each OLAP dimension. Following this research trend, in this paper we propose a novel method, founded on a density-based hierarchical clustering algorithm, to support roll-up and drill-down operations over OLAP data cubes with continuous dimensions. The method hierarchically clusters dimension instances by also taking fact-table measures into account. Thus, we enhance the clustering effect with respect to the possible analysis. Experiments on two well-known multidimensional datasets clearly show the advantages of the proposed solution.

[1]  Qiming Chen,et al.  An OLAP-based Scalable Web Access Analysis Engine , 2000, DaWaK.

[2]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[3]  Jennifer Chiang,et al.  Issues for On-Line Analytical Mining of Data Warehouses , 1998 .

[4]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[5]  Philip S. Yu,et al.  CrossClus: user-guided multi-relational clustering , 2007, Data Mining and Knowledge Discovery.

[6]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[7]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[8]  Giandomenico Spezzano,et al.  A single pass algorithm for clustering evolving data streams based on swarm intelligence , 2011, Data Mining and Knowledge Discovery.

[9]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[10]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[11]  Alfredo Cuzzocrea,et al.  Enabling OLAP in mobile environments via intelligent data cube compression techniques , 2008, Journal of Intelligent Information Systems.

[12]  Jiawei Han,et al.  Towards on-line analytical mining in large databases , 1998, SGMD.

[13]  T. Mexia,et al.  Author ' s personal copy , 2009 .

[14]  Alfredo Cuzzocrea,et al.  ClustCube: an OLAP-based framework for clustering and mining complex database objects , 2011, SAC '11.

[15]  Ruggero G. Pensa,et al.  Parameter-less co-clustering for star-structured heterogeneous data , 2012, Data Mining and Knowledge Discovery.

[16]  Michelangelo Ceci,et al.  Network regression with predictive clustering trees , 2011, Data Mining and Knowledge Discovery.

[17]  Jian Pei,et al.  Mining Multi-Dimensional Constrained Gradients in Data Cubes , 2001, VLDB.

[18]  Alfredo Cuzzocrea,et al.  Approximate range–sum query answering on data cubes with probabilistic guarantees , 2007, Journal of Intelligent Information Systems.

[19]  Dimitrios Gunopulos,et al.  Selectivity estimators for multidimensional range queries over real attributes , 2005, The VLDB Journal.

[20]  Nick Roussopoulos,et al.  DynaMat: a dynamic view management system for data warehouses , 1999, SIGMOD '99.

[21]  Alfredo Cuzzocrea,et al.  Improving range-sum query evaluation on data cubes via polynomial approximation , 2006, Data Knowl. Eng..

[22]  Michelangelo Ceci,et al.  A Novel Biclustering Algorithm for the Discovery of Meaningful Biological Correlations between microRNAs and their Target Genes , 2013, BMC Bioinformatics.

[23]  Paul S. Bradley,et al.  Compressed data cubes for OLAP aggregate query approximation on continuous dimensions , 1999, KDD '99.

[24]  Barbara Wixom,et al.  The Current State of Business Intelligence , 2007, Computer.

[25]  Christos Faloutsos,et al.  Proceedings of the 1999 ACM SIGMOD international conference on Management of data , 1999, SIGMOD 1999.

[26]  Aidong Zhang,et al.  WaveCluster: a wavelet-based clustering approach for spatial data in very large databases , 2000, The VLDB Journal.

[27]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[28]  Hua Zhu,et al.  On-Line Analytical Mining of Association Rules , 1998 .

[29]  Leonid Khachiyan,et al.  Cubegrades: Generalizing Association Rules , 2002, Data Mining and Knowledge Discovery.

[30]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[31]  Tie-Yan Liu,et al.  Star-Structured High-Order Heterogeneous Data Co-clustering Based on Consistent Information Theory , 2006, Sixth International Conference on Data Mining (ICDM'06).

[32]  Christos Faloutsos,et al.  SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1-3, 1999, Philadelphia, Pennsylvania, USA , 1999 .

[33]  Michelangelo Ceci,et al.  Hierarchical and Overlapping Co-Clustering of mRNA: miRNA Interactions , 2012, ECAI.

[34]  Sunita Sarawagi,et al.  iDiff: Informative Summarization of Differences in Multidimensional Aggregates , 2001, Data Mining and Knowledge Discovery.

[35]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[36]  Daniel A. Keim,et al.  Clustering methods for large databases: from the past to the future , 1999, SIGMOD '99.

[37]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[38]  Alok N. Choudhary,et al.  PARSIMONY: An Infrastructure for Parallel Multidimensional Analysis and Data Mining , 2001, J. Parallel Distributed Comput..

[39]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[40]  Sabine Loudcher,et al.  Enhanced mining of association rules from data cubes , 2006, DOLAP '06.

[41]  Alfredo Cuzzocrea,et al.  Semantics-Aware Advanced OLAP Visualization of Multidimensional Data Cubes , 2007, Int. J. Data Warehous. Min..

[42]  Nimrod Megiddo,et al.  Discovery-Driven Exploration of OLAP Data Cubes , 1998, EDBT.

[43]  Dimitrios Gunopulos,et al.  Automatic Subspace Clustering of High Dimensional Data , 2005, Data Mining and Knowledge Discovery.

[44]  Saso Dzeroski,et al.  Predicting Gene Function using Predictive Clustering Trees , 2010, Inductive Databases and Constraint-Based Data Mining.

[45]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.