Deriving skyline points over dynamic and incomplete databases

The rapid growth of data is inevitable, and retrieving the best results that meet the user’s preferences is essential. To achieve this, skylines were introduced in which data items that are not dominated by the other data items in the database are retrieved as results (skylines). In most of the exist-ing skyline approaches, the databases are assumed to be static and complete. However, in real world scenario, databases are not complete especially in multidimensional databases in which some dimensions may have missing values. The databases might also be dynamic in which new data items are inserted while existing data items are deleted or updated. Blindly performing pairwise comparisons on the whole data items after the changes are made is inappropriate as not all data items need to be compared in identifying the skylines. Thus, a novel skyline algorithm, DInSkyline, is proposed in this study which finds the most relevant data items in dynamic and incomplete databases. Several experiments have been conducted and the results show that DInSkyline outperforms the previous works by reducing the number of pairwise comparisons in the range of 52% to 73%.

[1]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[2]  Mohamed F. Mokbel,et al.  Skyline Query Processing for Incomplete Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[3]  Anthony K. H. Tung,et al.  Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.

[4]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[5]  Hong Zou,et al.  Notice of RetractionFinding k-dominant skyline in dynamic data set , 2011, 2011 Seventh International Conference on Natural Computation.

[6]  Hamidah Ibrahim,et al.  ESTIMATING MISSING VALUES OF SKYLINES IN INCOMPLETE DATABASE , 2013, DEIS 2013.