Efficient cube computing on an extended multidimensional model over uncertain data

Data uncertainty is an inherent property in various applications due to reasons such as measurement errors, incompleteness of data and so on. While On-Line Analytical Processing (OLAP) has been a powerful method for analyzing large data warehouse, OLAP over uncertain data has become a valuable and attractive issue because of the increasingly demand for handling uncertainty in multidimensional data. In this paper, we firstly describe our UStar-Schema model that extends the traditional OLAP model to support uncertain dimension attributes in fact table, uncertain measures in fact table and uncertainty in dimension table. Then we extend the processing model of the aggregate queries and cube computing on Ustar-Schema. Secondly, we design a novel index structure called PSI-Index on UStar-Schema to improve efficiency of OLAP quering and cube computing. Furthermore, an advanced index structure called HB-Index and an efficient algorithm are proposed to accelerate iceberg cube computing based on our model using pruning techniques to eliminate huge amounts of useless computations. Finally, extensive experiments are performed to examine the efficiency and effectiveness of our proposed techniques.

[1]  Jeffrey Scott Vitter,et al.  Efficient join processing over uncertain data , 2006, CIKM '06.

[2]  Arie Shoshani,et al.  Summarizability in OLAP and statistical data bases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[3]  Feifei Li,et al.  Finding frequent items in probabilistic data , 2008, SIGMOD Conference.

[4]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[5]  Timos K. Sellis,et al.  A survey of logical models for OLAP databases , 1999, SGMD.

[6]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[7]  Charu C. Aggarwal,et al.  Managing and Mining Uncertain Data , 2009, Advances in Database Systems.

[8]  Serge Abiteboul,et al.  On the Representation and Querying of Sets of Possible Worlds , 1991, Theor. Comput. Sci..

[9]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[10]  T. S. Jayram,et al.  Efficient allocation algorithms for OLAP over imprecise data , 2006, VLDB.

[11]  Raghu Ramakrishnan,et al.  OLAP over Imprecise Data with Domain Constraints , 2007, VLDB.

[12]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[13]  Robert B. Ross,et al.  Aggregate operators in probabilistic databases , 2005, JACM.

[14]  T. S. Jayram,et al.  OLAP over uncertain and imprecise data , 2007, The VLDB Journal.

[15]  Ihab F. Ilyas,et al.  Efficient search for the top-k probable nearest neighbors in uncertain databases , 2008, Proc. VLDB Endow..

[16]  Kevin Chen-Chuan Chang,et al.  Probabilistic top-k and ranking-aggregate queries , 2008, TODS.

[17]  Jeffrey Scott Vitter,et al.  Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data , 2004, VLDB.

[18]  Carson Kai-Sang Leung,et al.  Mining uncertain data , 2011, WIREs Data Mining Knowl. Discov..

[19]  Alejandro P. Buchmann,et al.  Encoded bitmap indexing for data warehouses , 1998, Proceedings 14th International Conference on Data Engineering.

[20]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.