Trustworthy answers for top-k queries on uncertain Big Data in decision making

Effectively extracting reliable and trustworthy information from Big Data has become crucial for large business enterprises. Obtaining useful knowledge to enable better decisions to be made in order to improve business performance is not a trivial task. The most fundamental challenge for Big Data extraction is to handle the uncertainty of data to meet emerging business needs, such as marketing analysis, future prediction and decision making. In this paper, we firstly propose a novel approach called Dominating Top-k Aggregate Query (DA-Topk) to provide trustworthy and reliable informative knowledge from uncertain Big Data by combining the techniques of skyline and top-k queries. Then, we design a number of pruning rules to reduce the search space and terminate the ranking process as early as possible. Next, we provide a deeper analysis regarding the satisfaction of the six ranking properties (i.e. exact-k, containment, unique-rank, value-invariance, stability and faithfulness) between our approach and existing approaches to demonstrate that our method is the only one which satisfied all of these properties. Extensive experiments with both real and synthetic data sets have been conducted to verify the efficiency and effectiveness of our proposed approach compared to the state-of-the-art approaches. Our approach can help managers make strategic decisions quickly and accurately in competitive market places.

[1]  Xi Zhang,et al.  Semantics and evaluation of top-k queries in probabilistic databases , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[2]  Wilfred Ng,et al.  Robust Ranking of Uncertain Data , 2011, DASFAA.

[3]  Xiang Lian,et al.  Causality and responsibility: probabilistic queries revisited in uncertain databases , 2013, CIKM.

[4]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Trans. Knowl. Data Eng..

[5]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[6]  Jian Pei,et al.  Ranking queries on uncertain data , 2010, The VLDB Journal.

[7]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[8]  Jinli Cao,et al.  Top-k Best Probability Queries on Probabilistic Data , 2012, DASFAA.

[9]  Vipin Kumar,et al.  Trends in big data analytics , 2014, J. Parallel Distributed Comput..

[10]  Charu C. Aggarwal,et al.  Graphical Models for Uncertain Data , 2009 .

[11]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[12]  Stanley B. Zdonik,et al.  Top-k queries on uncertain data: on score distribution and typical answers , 2009, SIGMOD Conference.

[13]  Edmon Begoli,et al.  Design Principles for Effective Knowledge Discovery from Big Data , 2012, 2012 Joint Working IEEE/IFIP Conference on Software Architecture and European Conference on Software Architecture.

[14]  Xuemin Lin,et al.  Identifying Top k Dominating Objects over Uncertain Data , 2014, DASFAA.

[15]  Naixue Xiong,et al.  Global Top-k Aggregate Queries Based on X-tuple in Uncertain Database , 2010, 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops.

[16]  Xiang Lian,et al.  Probabilistic top-k dominating queries in uncertain databases , 2013, Inf. Sci..

[17]  Jian Pei,et al.  Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[18]  Xiang Lian,et al.  Probabilistic ranked queries in uncertain databases , 2008, EDBT '08.

[19]  Jianwen Chen,et al.  Efficient pruning algorithm for top-K ranking on dataset with value uncertainty , 2013, CIKM.

[20]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[22]  Xiang Li,et al.  Cleaning uncertain data for top-k queries , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).