Supporting imprecision in multidimensional databases using granularities

Online analytical processing (OLAP) technologies are being used widely, but the lack of effective means of handling data imprecision, which occurs when exact values are not known precisely or are entirely missing, represents a major obstacle in applying these technologies in many domains. The paper develops techniques for handling imprecision that aim to maximally reuse existing OLAP modeling constructs such as dimension hierarchies and granularities. With imprecise data available in the database, queries are tested to determine whether or not they may be answered precisely given the available data; if not, alternative queries unaffected by the imprecision are suggested. When processing queries affected by imprecision, techniques are proposed that take into account the imprecision in the grouping of the data, in the subsequent aggregate computation, and in the presentation of the imprecise result to the user. The approach is capable of exploiting existing OLAP query processing techniques such as pre-aggregation, yielding an effective approach with low computational overhead and that may be implemented using current technology.

[1]  Elke A. Rundensteiner,et al.  Evaluating aggregates in possibilistic relational databases , 1992, Data Knowl. Eng..

[2]  Curtis E. Dyreson,et al.  A Bibliography on Uncertainty Management in Information Systems , 1996, Uncertainty Management in Information Systems.

[3]  Erol Gelenbe,et al.  A probability model of uncertainty in data bases , 1986, 1986 IEEE Second International Conference on Data Engineering.

[4]  Arbee L. P. Chen,et al.  Evaluating Aggregate Operations Over Imprecise Data , 1996, IEEE Trans. Knowl. Data Eng..

[5]  Anthony C. Klug Equivalence of Relational Algebra and Relational Calculus Query Languages Having Aggregate Functions , 1982, JACM.

[6]  Laurian M. Chirica,et al.  The entity-relationship model: toward a unified view of data , 1975, SIGF.

[7]  Curtis E. Dyreson,et al.  Supporting valid-time indeterminacy , 1998, TODS.

[8]  H. Toutenburg,et al.  Rubin, D.B.: Multiple imputation for nonresponse in surveys , 1990 .

[9]  Curtis E. Dyreson,et al.  A Glossary of Time Granularity Concepts , 1997, Temporal Databases, Dagstuhl.

[10]  WongEugene A statistical approach to incomplete information in database systems , 1982 .

[11]  Peter P. Chen The entity-relationship model: toward a unified view of data , 1975, VLDB '75.

[12]  Clement T. Yu,et al.  Efficient Management of Materialized Generalized Transitive Closure in Centralized and Parallel Environments , 1992, IEEE Trans. Knowl. Data Eng..

[13]  Rakesh Agrawal,et al.  An access structure for generalized transitive closure queries , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[14]  Stef van Buuren,et al.  Routine multiple imputation in statistical databases , 1994, Seventh International Working Conference on Scientific and Statistical Database Management.

[15]  Curtis E. Dyreson,et al.  Information Retrieval from an Incomplete Data Cube , 1996, VLDB.

[16]  Amihai Motro,et al.  Uncertainty Management in Information Systems: From Needs to Solution , 1996 .

[17]  Shin-Chung Shao Multivariate and Multidimensional OLAP , 1998, EDBT.

[18]  E. F. Codd,et al.  Extending the data base relational model to capture more meaning , 1979, SIGMOD '79.

[19]  E. F. Codd,et al.  Extending the database relational model to capture more meaning , 1979, ACM Trans. Database Syst..

[20]  L. Welt Principles of Internal Medicine , 1955, The Yale Journal of Biology and Medicine.

[21]  Arie Shoshani,et al.  Summarizability in OLAP and statistical data bases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[22]  Elke A. Rundensteiner,et al.  Aggregates in Possibilistic Databases , 1989, VLDB.

[23]  T. R. Harrison Principles of internal medicine , 1955 .

[24]  Peter Gluchowski,et al.  Data Warehouse , 1997, Informatik-Spektrum.

[25]  Torben Bach Pedersen,et al.  Multidimensional data modeling for complex data , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).