Missing values estimation for skylines in incomplete database

Incompleteness of data is a common problem in many databases including web heterogeneous databases, multi-relational databases, spatial and temporal databases,and data integration.The incompleteness of data introduces challenges in processing queries as providing accurate results that best meet the query conditions over incomplete database is not a trivial task.Several techniques have been proposed to processqueries in incomplete database. Some of these techniques retrieve the query results based on the existing values rather than estimating the missing values.Such techniques are undesirable in many cases as the dimensions with missing values might be the important dimensions of the user’s query.Besides, the output is incomplete and might not satisfy the user preferences.In this paper we propose an approach that estimates missing values in skylines to guide users in selecting the most appropriate skylines from the several candidate skylines. The approach utilizes the concept of mining attribute correlations to generate an Approximate Functional Dependencies (AFDs) that captured the relationships between the imensions. Besides, identifythe strength of probability correlations to estimate the values. Then, the skylines with estimated values are ranked. By doing so, we ensure that the retrieved skylines are in the order of their estimated precision.

[1]  Dan Olteanu,et al.  From complete to incomplete information and back , 2007, SIGMOD '07.

[2]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Beng Chin Ooi,et al.  Fast High-Dimensional Data Search in Incomplete Databases , 1998, VLDB.

[4]  Anthony K. H. Tung,et al.  On High Dimensional Skylines , 2006, EDBT.

[5]  Bhekisipho Twala,et al.  Comparison of various methods for handling incomplete data in software engineering databases , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[6]  Sheldon Shen Database relaxation: An approach to query processing in incomplete databases , 1988, Inf. Process. Manag..

[7]  Man Lung Yiu,et al.  Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data , 2007, VLDB.

[8]  Ilaria Bartolini,et al.  SaLSa: computing the skyline without scanning the whole sky , 2006, CIKM '06.

[9]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[10]  Mohamed F. Mokbel,et al.  Skyline Query Processing for Incomplete Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[11]  Wei Cheng,et al.  Searching Dimension Incomplete Databases , 2014, IEEE Transactions on Knowledge and Data Engineering.

[12]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[13]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[14]  Jerzy W. Grzymala-Busse,et al.  A Comparison of Several Approaches to Missing Attribute Values in Data Mining , 2000, Rough Sets and Current Trends in Computing.

[15]  Jan Chomicki,et al.  Skyline with Presorting: Theory and Optimizations , 2005, Intelligent Information Systems.

[16]  Dan Olteanu,et al.  $${10^{(10^{6})}}$$ worlds and beyond: efficient representation and processing of incomplete information , 2006, 2007 IEEE 23rd International Conference on Data Engineering.

[17]  Anthony K. H. Tung,et al.  Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.

[18]  Aloysius George,et al.  Efficient high dimension data clustering using constraint-partitioning k-means algorithm , 2013, Int. Arab J. Inf. Technol..

[19]  Dan Olteanu,et al.  10106 Worlds and Beyond: Efficient Representation and Processing of Incomplete Information , 2007, ICDE.

[20]  P. Sreenivasa Kumar,et al.  Finding Skylines for Incomplete Data , 2013, ADC.

[21]  Subbarao Kambhampati,et al.  Query processing over incomplete autonomous databases: query rewriting using learned data dependencies , 2009, The VLDB Journal.

[22]  Werner Nutt,et al.  Completeness of queries over incomplete databases , 2011, Proc. VLDB Endow..

[23]  Hakan Ferhatosmanoglu,et al.  Indexing Incomplete Databases , 2006, EDBT.

[24]  Jerzy W. Grzymala-Busse,et al.  Rough Set Approach to Incomplete Data , 2004, ICAISC.

[25]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[26]  P. Sreenivasa Kumar,et al.  Finding Superior Skyline Points from Incomplete Data , 2013, COMAD.

[27]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[28]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[29]  N. Miyazaki,et al.  An incomplete database approach to global query processing , 1998, Proceedings Twelfth International Conference on Information Networking (ICOIN-12).

[30]  Ihab F. Ilyas,et al.  Supporting ranking queries on uncertain and incomplete data , 2010, The VLDB Journal.

[31]  Jerzy W. Grzymala-Busse,et al.  Data with Missing Attribute Values: Generalization of Indiscernibility Relation and Rule Induction , 2004, Trans. Rough Sets.

[32]  Claes Wohlin,et al.  An evaluation of k-nearest neighbour imputation using Likert data , 2004, 10th International Symposium on Software Metrics, 2004. Proceedings..

[33]  Jerzy W. Grzymala-Busse,et al.  Local and Global Approximations for Incomplete Data , 2006, Trans. Rough Sets.

[34]  Karl Aberer,et al.  Evaluating top-k queries over incomplete data streams , 2009, CIKM.

[35]  Jef Wijsen,et al.  On First-Order Query Rewriting for Incomplete Database Histories , 2009, 2009 16th International Symposium on Temporal Representation and Reasoning.

[36]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[37]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).