ESTIMATING MISSING VALUES OF SKYLINES IN INCOMPLETE DATABASE

Incompleteness of data is a common problem in many databases including web heterogonous databases, multirelational databases, spatial and temporal databases and data integration. The incompleteness of data introduces challenges in processing queries as providing accurate results that best meet the query conditions over incomplete database is not a trivial task. Several techniques have been proposed to process queries in incomplete database. Some of these techniques retrieve the query results based on the existing values rather than estimating the missing values. Such techniques are undesirable in many cases as the dimensions with missing values might be the important dimensions of the user’s query. Besides, the output is incomplete and might not satisfy the user preferences. In this paper we propose an approach that estimates missing values in skylines to guide users in selecting the most appropriate skylines from the several candidate skylines. The approach utilizes the concept of mining attribute correlations to generate an Approximate Functional Dependencies (AFDs) that captured the relationships between the dimensions. Besides, identifying the strength of probability correlations to estimate the values. Then, the skylines with estimated values are ranked. By doing so, we ensure that the retrieved skylines are in the order of their estimated precision.

[1]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[2]  Bhekisipho Twala,et al.  Comparison of various methods for handling incomplete data in software engineering databases , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[3]  Raymond Chi-Wing Wong,et al.  Efficient skyline querying with variable user preferences on nominal attributes , 2008, Proc. VLDB Endow..

[4]  Sheldon Shen Database relaxation: An approach to query processing in incomplete databases , 1988, Inf. Process. Manag..

[5]  Gustavo E. A. P. A. Batista,et al.  An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..

[6]  Claes Wohlin,et al.  An evaluation of k-nearest neighbour imputation using Likert data , 2004 .

[7]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[8]  Jan Chomicki,et al.  Skyline with Presorting: Theory and Optimizations , 2005, Intelligent Information Systems.

[9]  Dan Olteanu,et al.  From complete to incomplete information and back , 2007, SIGMOD '07.

[10]  Alon Y. Levy Obtaining Complete Answers from Incomplete Databases , 1996, VLDB 1996.

[11]  Anthony K. H. Tung,et al.  Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.

[12]  Gultekin Özsoyoglu,et al.  A Family of Incomplete Relational Database Models , 1989, VLDB.

[13]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[14]  Karl Aberer,et al.  Evaluating top-k queries over incomplete data streams , 2009, CIKM.

[15]  Beng Chin Ooi,et al.  Fast High-Dimensional Data Search in Incomplete Databases , 1998, VLDB.

[16]  Anthony K. H. Tung,et al.  On High Dimensional Skylines , 2006, EDBT.

[17]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[18]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[19]  Jerzy W. Grzymala-Busse,et al.  Local and Global Approximations for Incomplete Data , 2006, Trans. Rough Sets.

[20]  Jef Wijsen,et al.  On First-Order Query Rewriting for Incomplete Database Histories , 2009, 2009 16th International Symposium on Temporal Representation and Reasoning.

[21]  Jerzy W. Grzymala-Busse,et al.  A Comparison of Several Approaches to Missing Attribute Values in Data Mining , 2000, Rough Sets and Current Trends in Computing.

[22]  Mohamed F. Mokbel,et al.  Skyline Query Processing for Incomplete Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[23]  Subbarao Kambhampati,et al.  Query processing over incomplete autonomous databases: query rewriting using learned data dependencies , 2009, The VLDB Journal.

[24]  Jian Pei,et al.  Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces , 2005, VLDB.

[25]  Jerzy W. Grzymala-Busse,et al.  Rough Set Approach to Incomplete Data , 2004, ICAISC.

[26]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[27]  Man Lung Yiu,et al.  Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data , 2007, VLDB.

[28]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[29]  Mohamed F. Mokbel,et al.  Toward context and preference-aware location-based services , 2009, MobiDE.

[30]  N. Miyazaki,et al.  An incomplete database approach to global query processing , 1998, Proceedings Twelfth International Conference on Information Networking (ICOIN-12).

[31]  Ihab F. Ilyas,et al.  Supporting ranking queries on uncertain and incomplete data , 2010, The VLDB Journal.

[32]  Werner Nutt,et al.  Completeness of queries over incomplete databases , 2011, Proc. VLDB Endow..

[33]  Hakan Ferhatosmanoglu,et al.  Indexing Incomplete Databases , 2006, EDBT.