A Model for Processing Skyline Queries over a Database with Missing Data

Skyline queries provide a flexible query operator that returns data items (skylines) which are not being dominated by other data items in all dimensions (attributes) of the database. Most of the existing skyline techniques determine the skylines by assuming that the values of dimensions for every data item are available (complete). However, this assumption is not always true particularly for multidimensional database as some values may be missing. The incompleteness of data leads to the loss of the transitivity property of skyline technique and results into failure in test dominance as some data items are incomparable to each other. Furthermore, incompleteness of data influences negatively on the process of finding skylines, leading to high overhead, due to exhaustive pairwise comparisons between the data items. This paper proposed a model to process skyline queries for incomplete data with the aim of avoiding the issue of cyclic dominance in deriving skylines. The proposed model for identifying skylines for incomplete data consists of four components, namely: Data Clustering Builder, Group Constructor and Local Skylines Identifier, k-dom Skyline Generator, and Incomplete Skylines Identifier. Including these processes in the proposed model has optimized the process of identifying skylines in incomplete database by reducing the necessary number of pairwise comparison through eliminating the dominated data items as early as possible before applying the skyline technique.

[1]  Man Lung Yiu,et al.  Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data , 2007, VLDB.

[2]  Jian Pei,et al.  SUBSKY: Efficient Computation of Skylines in Subspaces , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[3]  Ilaria Bartolini,et al.  SaLSa: computing the skyline without scanning the whole sky , 2006, CIKM '06.

[4]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[5]  Mohamed F. Mokbel,et al.  Toward context and preference-aware location-based services , 2009, MobiDE.

[6]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[7]  Karl Aberer,et al.  Evaluating top-k queries over incomplete data streams , 2009, CIKM.

[8]  Man Lung Yiu,et al.  Multi-dimensional top-k dominating queries , 2009, The VLDB Journal.

[9]  Jian Pei,et al.  Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces , 2005, VLDB.

[10]  Yuan Tian,et al.  Z-SKY: an efficient skyline query processing framework based on Z-order , 2010, The VLDB Journal.

[11]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[12]  Mohamed F. Mokbel,et al.  Skyline Query Processing for Incomplete Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[13]  Qing Liu,et al.  Efficient Computation of the Skyline Cube , 2005, VLDB.

[14]  Wei Wang,et al.  A Novel Incremental Maintenance Algorithm of SkyCube , 2006, DEXA.

[15]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[16]  Ihab F. Ilyas,et al.  Supporting ranking queries on uncertain and incomplete data , 2010, The VLDB Journal.

[17]  Luis Gravano,et al.  Evaluating Top-k Selection Queries , 1999, VLDB.

[18]  Florence Sèdes,et al.  LA-GPS : A location-aware geographical pervasive system , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[19]  Evaggelia Pitoura,et al.  BITPEER: continuous subspace skyline computation with distributed bitmap indexes , 2008, DaMaP '08.

[20]  P. Sreenivasa Kumar,et al.  Finding Skylines for Incomplete Data , 2013, ADC.

[21]  Anthony K. H. Tung,et al.  On High Dimensional Skylines , 2006, EDBT.

[22]  Anthony K. H. Tung,et al.  Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.

[23]  Mohamed F. Mokbel,et al.  FlexPref: A framework for extensible preference evaluation in database systems , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[24]  Seung-won Hwang,et al.  Personalized top-k skyline queries in high-dimensional space , 2009, Inf. Syst..

[25]  Jignesh M. Patel,et al.  Efficient Continuous Skyline Computation , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[26]  Wei Wang,et al.  Efficient mining of skyline objects in subspaces over data streams , 2010, Knowledge and Information Systems.

[27]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[28]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.