Discovering the Most Influential Sites over Uncertain Data: A Rank-Based Approach

With the rapidly increasing availability of uncertain data in many important applications such as location-based services, sensor monitoring, and biological information management systems, uncertainty-aware query processing has received a significant amount of research effort from the database community in recent years. In this paper, we investigate a new type of query in the context of uncertain databases, namely uncertain top-k influential sites query (UTkIS query for short), which can be applied in a wide range of application areas such as marketing analysis and mobile services. Since it is not so straightforward to precisely define the semantics of top-k query with uncertain data, in this paper we introduce a novel and more intuitive formulation of the query on the basis of expected rank semantics. To address the efficiency issue caused by possible worlds exploration, we propose effective pruning rules and a divide-and-conquer paradigm such that the number of candidates as well as the number of possible worlds to be considered can be significantly reduced. Finally, we conduct extensive experiments on real data sets to verify the effectiveness and efficiency of the new methods proposed in this paper.

[1]  Shashi Shekhar,et al.  Continuous Evaluation of Monochromatic and Bichromatic Reverse Nearest Neighbors , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[2]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Transactions on Knowledge and Data Engineering.

[3]  Jeffrey Scott Vitter,et al.  Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data , 2004, VLDB.

[4]  Yufei Tao,et al.  Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions , 2005, VLDB.

[5]  Reynold Cheng,et al.  Evaluating probability threshold k-nearest-neighbor queries over uncertain data , 2009, EDBT '09.

[6]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[7]  Reynold Cheng,et al.  Efficient Evaluation of Imprecise Location-Dependent Queries , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[8]  Xiang Lian,et al.  Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data , 2009, The VLDB Journal.

[9]  Jian Li,et al.  A unified approach to ranking in probabilistic databases , 2009, The VLDB Journal.

[10]  Jian Pei,et al.  Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[11]  Chi-Yin Chow,et al.  Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[12]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[13]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[14]  Rynson W. H. Lau,et al.  Knowledge and Data Engineering for e-Learning Special Issue of IEEE Transactions on Knowledge and Data Engineering , 2008 .

[15]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[16]  Xiang Lian,et al.  Probabilistic Group Nearest Neighbor Queries in Uncertain Databases , 2008, IEEE Transactions on Knowledge and Data Engineering.

[17]  Hans-Peter Kriegel,et al.  Probabilistic Nearest-Neighbor Query on Uncertain Objects , 2007, DASFAA.

[18]  Divyakant Agrawal,et al.  Reverse Nearest Neighbor Queries for Dynamic Databases , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[19]  Hans-Peter Kriegel,et al.  Probabilistic Similarity Join on Uncertain Data , 2006, DASFAA.

[20]  Yufei Tao,et al.  Reverse kNN Search in Arbitrary Dimensionality , 2004, VLDB.

[21]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[22]  Xi Zhang,et al.  On the semantics and evaluation of top-k queries in probabilistic databases , 2008, ICDE Workshops.

[23]  Yang Du,et al.  On Computing Top-t Most Influential Spatial Sites , 2005, VLDB.

[24]  Ambuj K. Singh,et al.  Top-k Spatial Joins of Probabilistic Objects , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[25]  Xuemin Lin,et al.  Efficient rank based KNN query processing over uncertain data , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[26]  King-Ip Lin,et al.  An index structure for efficient reverse nearest neighbor queries , 2001, Proceedings 17th International Conference on Data Engineering.

[27]  Yufei Tao,et al.  Reverse Nearest Neighbor Search in Metric Spaces , 2006, IEEE Transactions on Knowledge and Data Engineering.

[28]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[29]  Stanley B. Zdonik,et al.  Top-k queries on uncertain data: on score distribution and typical answers , 2009, SIGMOD Conference.

[30]  Xiang Lian,et al.  Monochromatic and bichromatic reverse skyline search over uncertain databases , 2008, SIGMOD Conference.

[31]  Xi Zhang,et al.  Semantics and evaluation of top-k queries in probabilistic databases , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[32]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[33]  S. Muthukrishnan,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.