论文信息 - Smoothing over Summary Information in Data Cubes

Smoothing over Summary Information in Data Cubes

Decision support usuallyrequires drawing from a huge data warehouse some statisticalinformation that is interesting and useful to its users. A typicaldata model that supports the data warehouse is the multidimensionaldatabase, also known as a data cube. A data cube contains cells,each of which is associated with some summary information, or aggregate, that the decisions are to be based on. However, inreal-life databases, due to the nature of their contents, datadistribution tends to be clustered and sparse. The sparsity situationgets worse, in general, as the number of cells increases. Forthose cells that have support levels below a certain threshold,combining with adjacent cells is necessary to acquire sufficientsupport. Otherwise, incomplete or biased results could be deriveddue to lack of sufficient support.Our mainfocus in this paper is to find approximations for the missingor biased aggregates of those cells that have missing or lowsupport. We call this approximation process smoothing in thispaper. We propose a smooth function that can smooth nicely ona quantitative attribute while still being preserved locally.Our method is also adaptive to sudden changes of data distribution,called discontinuities, that inevitably occur in real-life data.

Sam Yuan Sung | Stephen Huang | Arthur Ramer

[1] Francesco M. Malvestuto,et al. A universal-scheme approach to statistical databases containing homogeneous summary tables , 1993, TODS.

[2] V. S. Subrahmanian,et al. Maintaining views incrementally , 1993, SIGMOD Conference.

[3] Ron Kohavi,et al. Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[4] Alex Alves Freitas,et al. Mining Very Large Databases with Parallel Processing , 1997, The Kluwer International Series on Advances in Database Systems.

[5] Christos Faloutsos,et al. Recovering Information from Summary Data , 1997, VLDB.

[6] Yasuhiko Morimoto,et al. Mining optimized association rules for numeric attributes , 1996, J. Comput. Syst. Sci..

[7] Demetri Terzopoulos,et al. Regularization of Inverse Visual Problems Involving Discontinuities , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Abraham Silberschatz,et al. View maintenance issues for the chronicle data model (extended abstract) , 1995, PODS.

[9] Jeffrey D. Ullman,et al. Implementing data cubes efficiently , 1996, SIGMOD '96.

[10] Randy Kerber,et al. ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[11] Ramakrishnan Srikant,et al. Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.