Ranking distributed database in tuple-level uncertainty

Ranking in uncertain database environments has gained a great importance recently. Many techniques were introduced to rank uncertain databases and others to rank distributed certain databases. Unfortunately, there are not that much techniques in ranking distributed uncertain databases. This paper proposes a framework that improves ranking processing in the case of uncertain and distributed database. In the proposed framework, new communication and computation-efficient algorithms are investigated for retrieving the top-k tuples from distributed sites. These algorithms are applied in tuple-level uncertainty. The main concern of the proposed algorithms is to reduce the communication rounds utilized and amount of data transmitted while achieving efficient ranking. Experimental results emphasize that both proposed algorithms have a great impact on reducing communication cost. Also, the results clarify that the first algorithm is efficient in the case of a low number of sites while the second achieves better performance in the context of a higher number of sites.

[1]  Dan Olteanu,et al.  $${10^{(10^{6})}}$$ worlds and beyond: efficient representation and processing of incomplete information , 2006, 2007 IEEE 23rd International Conference on Data Engineering.

[2]  Mao Ye,et al.  Probabilistic Top-k query processing in distributed sensor networks , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[3]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[4]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[5]  Ambuj K. Singh,et al.  Top-k Spatial Joins of Probabilistic Objects , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[6]  Tzung-Pei Hong,et al.  A new mining approach for uncertain databases using CUFP trees , 2012, Expert Syst. Appl..

[7]  Zhe Wang,et al.  Efficient top-K query calculation in distributed networks , 2004, PODC '04.

[8]  Amol Deshpande,et al.  Online Filtering, Smoothing and Probabilistic Modeling of Streaming data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[9]  Ihab F. Ilyas,et al.  Efficient search for the top-k probable nearest neighbors in uncertain databases , 2008, Proc. VLDB Endow..

[10]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[11]  Hua-Gang Li,et al.  Efficient Processing of Distributed Top-k Queries , 2005, DEXA.

[12]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  Dan Olteanu,et al.  Fast and Simple Relational Processing of Uncertain Data , 2007, 2008 IEEE 24th International Conference on Data Engineering.

[14]  Tzung-Pei Hong,et al.  An Integrated MFFP-tree Algorithm for Mining Global Fuzzy Rules from Distributed Databases , 2013, J. Univers. Comput. Sci..

[15]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[16]  Christos Doulkeridis,et al.  On efficient top-k query processing in highly distributed environments , 2008, SIGMOD Conference.

[17]  Feifei Li,et al.  Ranking distributed probabilistic data , 2009, SIGMOD Conference.

[18]  Yon Dohn Chung,et al.  POT: an efficient top-k monitoring method for spatially correlated sensor readings , 2008, DMSN '08.

[19]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[20]  Ling Tian,et al.  Efficient building algorithms of decision tree for uniformly distributed uncertain data , 2011, 2011 Seventh International Conference on Natural Computation.

[21]  Ali I. El-Desouky,et al.  Ranking distributed uncertain database systems: Discussion and analysis , 2010, The 2010 International Conference on Computer Engineering & Systems.

[22]  Richard T. Snodgrass,et al.  Editorial: Single- versus double-blind reviewing , 2007, TODS.

[23]  Yufei Tao,et al.  Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions , 2005, VLDB.

[24]  Jian Pei,et al.  Efficiently Answering Probabilistic Threshold Top-k Queries on Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[25]  Gerhard Weikum,et al.  Distributed top-k aggregation queries at large , 2009, Distributed and Parallel Databases.

[26]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[27]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[28]  Sunil Prabhakar,et al.  Querying imprecise data in moving object environments , 2003, IEEE Transactions on Knowledge and Data Engineering.

[29]  Dan Olteanu,et al.  From complete to incomplete information and back , 2007, SIGMOD '07.

[30]  Assaf Schuster,et al.  A geometric approach to monitoring threshold functions over distributed data streams , 2007, ACM Trans. Database Syst..

[31]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[32]  Ling Liu,et al.  Topk Queries across Multiple Private Databases , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[33]  Jian Pei,et al.  Managing Uncertain Data: Probabilistic Approaches , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[34]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[35]  Lei Zou,et al.  Efficient Top-k Monitoring of Abnormality in Sensor Networks , 2009, 2009 Ninth IEEE International Conference on Computer and Information Technology.

[36]  Xi Zhang,et al.  Semantics and evaluation of top-k queries in probabilistic databases , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[37]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Trans. Knowl. Data Eng..

[38]  Feifei Li,et al.  Finding frequent items in probabilistic data , 2008, SIGMOD Conference.

[39]  Y. AbdulAzeem,et al.  Ranking in uncertain distributed database environments , 2012, 2012 Seventh International Conference on Computer Engineering & Systems (ICCES).

[40]  Luis Gravano,et al.  Evaluating top-k queries over web-accessible databases , 2004, TODS.

[41]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[42]  Panos K. Chrysanthis,et al.  Power efficiency through tuple ranking in wireless sensor network monitoring , 2010, Distributed and Parallel Databases.

[43]  Jianliang Xu,et al.  Top-k Monitoring in Wireless Sensor Networks , 2007, IEEE Transactions on Knowledge and Data Engineering.

[44]  Toon Calders,et al.  Efficient Pattern Mining of Uncertain Data with Sampling , 2010, PAKDD.