Determining the Currency of Data

Data in real-life databases become obsolete rapidly. One often finds that multiple values of the same entity reside in a database. While all of these values were once correct, most of them may have become stale and inaccurate. Worse still, the values often do not carry reliable timestamps. With this comes the need for studying data currency, to identify the current value of an entity in a database and to answer queries with the current values, in the absence of timestamps. This paper investigates the currency of data. (1) We propose a model that specifies partial currency orders in terms of simple constraints. The model also allows us to express what values are copied from other data sources, bearing currency orders in those sources, in terms of copy functions defined on correlated attributes. (2) We study fundamental problems for data currency, to determine whether a specification is consistent, whether a value is more current than another, and whether a query answer is certain no matter how partial currency orders are completed. (3) Moreover, we identify several problems associated with copy functions, to decide whether a copy function imports sufficient current data to answer a query, whether such a function copies redundant data, whether a copy function can be extended to import necessary current data for a query while respecting the constraints, and whether it suffices to copy data of a bounded size. (4) We establish upper and lower bounds of these problems, all matching, for combined complexity and data complexity, and for a variety of query languages. We also identify special cases that warrant lower complexity.

[1]  Christian S. Jensen,et al.  On the semantics of “now” in databases , 1996, TODS.

[2]  Ron van der Meyden,et al.  Logical Approaches to Incomplete Information: A Survey , 1998, Logics for Databases and Information Systems.

[3]  Richard T. Snodgrass,et al.  Developing Time-Oriented Database Applications in SQL , 1999 .

[4]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[5]  Salil P. Vadhan,et al.  Computational Complexity , 2005, Encyclopedia of Cryptography and Security.

[6]  Manuel Bodirsky,et al.  The complexity of temporal constraint satisfaction problems , 2010, JACM.

[7]  Jan Chomicki,et al.  Consistent Query Answering: Five Easy Pieces , 2007, ICDT.

[8]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[9]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[10]  Neil Immerman,et al.  Recognizing patterns in streams with imprecise timestamps , 2010, Proc. VLDB Endow..

[11]  Wenfei Fan,et al.  Conditional functional dependencies for capturing data inconsistencies , 2008, TODS.

[12]  Ron van der Meyden,et al.  The complexity of querying indefinite data about linearly ordered domains , 1992, J. Comput. Syst. Sci..

[13]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[14]  E. F. Codd,et al.  Extending the database relational model to capture more meaning , 1979, ACM Trans. Database Syst..

[15]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[16]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[17]  Mark W. Krentel Generalizations of Opt P to the Polynomial Hierarchy , 1992, Theor. Comput. Sci..

[18]  Manolis Koubarakis,et al.  The Complexity of Query Evaluation in Indefinite Temporal Constraint Databases , 1997, Theor. Comput. Sci..

[19]  Alin Deutsch,et al.  The chase revisited , 2008, PODS.

[20]  Divesh Srivastava,et al.  Linking temporal records , 2011, Frontiers of Computer Science.

[21]  James Cheney,et al.  Curated databases , 2008, PODS.

[22]  Martin Grohe,et al.  The Complexity of Datalog on Linear Orders , 2009, Log. Methods Comput. Sci..

[23]  James Cheney,et al.  Provenance in Databases: Why, How, and Where , 2009, Found. Trends Databases.

[24]  Laks V. S. Lakshmanan,et al.  Discovering Conditional Functional Dependencies , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[25]  Jef Wijsen,et al.  Determining the currency of data , 2012 .

[26]  Eddie Schwalb,et al.  Temporal Constraints: A Survey , 1998, Constraints.

[27]  Wenfei Fan,et al.  Relative information completeness , 2009, PODS.

[28]  Victor Vianu Dynamic functional dependencies and database aging , 1987, JACM.

[29]  Divesh Srivastava,et al.  Global detection of complex copying relationships between sources , 2010, Proc. VLDB Endow..

[30]  Gösta Grahne,et al.  The Problem of Incomplete Information in Relational Databases , 1991, Lecture Notes in Computer Science.

[31]  Manolis Koubarakis,et al.  Database models for infinite and indefinite temporal information , 1994, Inf. Syst..

[32]  Leopoldo E. Bertossi,et al.  Consistent query answering in databases , 2006, SGMD.

[33]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[34]  Larry J. Stockmeyer,et al.  The Polynomial-Time Hierarchy , 1976, Theor. Comput. Sci..

[35]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[36]  Divesh Srivastava,et al.  Sailing the Information Ocean with Awareness of Currents: Discovery and Application of Source Dependence , 2009, CIDR.

[37]  David Toman,et al.  Time in Database Systems , 2014 .

[38]  Christian S. Jensen,et al.  Now in Temporal Databases , 2009, Encyclopedia of Database Systems.

[39]  Phokion G. Kolaitis Schema mappings, data exchange, and metadata management , 2005, PODS '05.