Preference-Based Top-k Representative Skyline Queries on Uncertain Databases

Top-k representative skyline queries are important for multi-criteria decision making applications since they provide an intuitive way to identify the k most significant objects for data analysts. Despite their importance, top-k representative skyline queries have not received adequate attention from the research community. Existing work addressing the problem focuses only on certain data models. For this reason, in this paper, we present the first study on processing top-k representative skyline queries in uncertain databases, based on user-defined references, regarding the priority of individual dimensions. We also apply the odds ratio to restrict the cardinality of the result set, instead of using a threshold which might be difficult for an end-user to define. We then develop two novel algorithms for answering top-k representative skyline queries on uncertain data. In addition, several pruning conditions are proposed to enhance the efficiency of our proposed algorithms. Performance evaluations are conducted on both real-life and synthetic datasets to demonstrate the efficiency, effectiveness and scalability of our proposed approaches.

[1]  Ira Assent,et al.  Taking the Big Picture: representative skylines based on significance and diversity , 2014, The VLDB Journal.

[2]  Richard J. Lipton,et al.  Regret-minimizing representative databases , 2010, Proc. VLDB Endow..

[3]  Xuemin Lin,et al.  Selecting Stars: The k Most Representative Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[4]  Jian Pei,et al.  Distance-Based Representative Skyline , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[5]  Man Lung Yiu,et al.  Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data , 2007, VLDB.

[6]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[7]  Xiang Lian,et al.  Top-k dominating queries in uncertain databases , 2009, EDBT '09.

[8]  Xuemin Lin,et al.  Identifying Top k Dominating Objects over Uncertain Data , 2014, DASFAA.

[9]  Jian Pei,et al.  Threshold-based probabilistic top-k dominating queries , 2010, The VLDB Journal.

[10]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[11]  Richard J. Lipton,et al.  Representative skylines using threshold-based preference distributions , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[12]  Jan Chomicki,et al.  Querying with Intrinsic Preferences , 2002, EDBT.

[13]  Yiyu Yao Measuring retrieval effectiveness based on user preference of documents , 1995 .

[14]  Bin Jiang,et al.  Ranking uncertain sky: The probabilistic top-k skyline operator , 2011, Inf. Syst..

[15]  Man Lung Yiu,et al.  Multi-dimensional top-k dominating queries , 2009, The VLDB Journal.

[16]  Xiang Lian,et al.  Probabilistic top-k dominating queries in uncertain databases , 2013, Inf. Sci..

[17]  Jinyan Li,et al.  Relative risk and odds ratio: a data mining perspective , 2005, PODS '05.

[18]  Saul Vargas,et al.  Exploiting the diversity of user preferences for recommendation , 2013, OAIR.

[19]  Yannis Manolopoulos,et al.  Continuous Top-k Dominating Queries in Subspaces , 2008, 2008 Panhellenic Conference on Informatics.

[20]  Seung-won Hwang,et al.  Skyline ranking for uncertain databases , 2014, Inf. Sci..

[21]  Christos Doulkeridis,et al.  Discovering Representative Skyline Points over Distributed Data , 2012, SSDBM.

[22]  Yiyu Yao,et al.  Evaluating information retrieval system performance based on user preference , 2010, Journal of Intelligent Information Systems.