Abstract Versus Concrete Temporal Query Languages

Versus Concrete Temporal Query Languages Jan Chomicki, University at Buffalo, USA, http://www.cse.buffalo.edu/~chomicki David Toman, University of Waterloo, Canada, http://www.cs.uwaterloo.ca/~david SYNONYMS historical query languages DEFINITION Temporal query languages are a family of query languages designed to query (and access in general) time-dependent information stored in temporal databases. The languages are commonly defined as extensions of standard query languages for non-temporal databases with temporal features. The additional features reflect the way dependencies of data on time are captured by and represented in the underlying temporal data model. HISTORICAL BACKGROUND Most databases store time-varying information. On the other hand, SQL is often the language of choice for developing applications that utilize the information in these databases. Plain SQL, however, does not seem to provide adequate support for temporal applications. Example. To represent the employment histories of persons, a common relational design would use a schema Employment(FromDate, ToDate, EID, Company), with the intended meaning that a person identified by EID worked for Company continuously from FromDate to ToDate. Note that while the above schema is a standard relational schema, the additional assumption that the values of the attributes FromDate and ToDate represent continuous periods of time is itself not a part of the relational model. Formulating even simple queries over such a schema is non-trivial: for example the query GAPS: “List all persons with gaps in their employment history, together with the gaps” leads to a rather complex formulation in, e.g., SQL over the above schema (this is left as a challenge to readers who consider themselves SQL experts; for a list of appealing, but incorrect solutions, including the reasons why, see [9]). The difficulty arises because a single tuple in the relation is conceptually a compact representation of a set of tuples, each tuple stating that an employment fact was true on a particular day. The tension between the conceptual abstract temporal data model (in the example, the property that employment facts are associated with individual time instants) and the need for an efficient and compact representation of temporal data (in the example, the representation of continuous periods by their start and end instants) has been reflected in the development of numerous temporal data models and temporal query languages [3]. SCIENTIFIC FUNDAMENTALS Temporal query languages are commonly defined using temporal extensions of existing non-temporal query languages, such as relational calculus, relational algebra, or SQL. The temporal extensions can be categorized in two, mostly orthogonal, ways: The choice of the actual temporal values manipulated by the language. This choice is primarily determined by the underlying temporal data model. The model also determines the associated operations on these values. The meaning of temporal queries is then defined in terms of temporal values and operations on them, and their interactions with data (non-temporal) values in a temporal database. The choice of syntactic constructs to manipulate temporal values in the language. This distinction determines whether the temporal values in the language are accessed and manipulated explicitly, in a way similar to other values stored in the database, or whether the access is implicit, based primarily on temporally extending the meaning of constructs that already exist in the underlying non-temporal language (while still using the operations defined by the temporal data model). Additional design considerations relate to compatibility with existing query languages, e.g., the notion of temporal upward compatibility. However, as illustrated above, an additional hurdle stems from the fact that many (early) temporal query languages allowed the users to manipulate a finite underlying representation of temporal databases rather than the actual temporal values/objects in the associated temporal data model. A typical example of this situation would be an approach in which the temporal data model is based on time instants, while the query language introduces interval-valued attributes. Such a discrepancy often leads to a complex and unintuitive semantics of queries. In order to clarify this issue, Chomicki has introduced the notions of abstract and concrete temporal databases and query languages [2]. Intuitively, abstract temporal query languages are defined at the conceptual level of the temporal data model, while their concrete counterparts operate directly on an actual compact encoding of temporal databases. The relationship between abstract and concrete temporal query languages is also implicitly present in the notion of snapshot equivalence [7]. Moreover, Bettini et al. [1] proposed to distinguish between explicit and implicit information in a temporal database. The explicit information is stored in the database and used to derive the implicit information through semantic assumptions. Semantic assumptions about fact persistence play a role similar to mappings between concrete and abstract databases, while other assumptions are used to address time-granularity issues. Abstract Temporal Query Languages Most temporal query languages derived by temporally extending the relational calculus can be classified as abstract temporal query languages. Their semantics is defined in terms of abstract temporal databases which, in turn, are typically defined within the point-stamped temporal data model, in particular without any additional hidden assumptions about the meaning of tuples in instances of temporal relations. Example. The employment histories in an abstract temporal data model would most likely be captured by a simpler schema “Employment(Date, EID, Company)”, with the intended meaning that a person identified by EID was working for Company on a particular Date. While instances of such a schema can be potentially very large (especially when a fine granularity of time is used), formulating queries is now much more natural.Temporal Query Languages Most temporal query languages derived by temporally extending the relational calculus can be classified as abstract temporal query languages. Their semantics is defined in terms of abstract temporal databases which, in turn, are typically defined within the point-stamped temporal data model, in particular without any additional hidden assumptions about the meaning of tuples in instances of temporal relations. Example. The employment histories in an abstract temporal data model would most likely be captured by a simpler schema “Employment(Date, EID, Company)”, with the intended meaning that a person identified by EID was working for Company on a particular Date. While instances of such a schema can be potentially very large (especially when a fine granularity of time is used), formulating queries is now much more natural. Choosing abstract temporal query languages over concrete ones resolves the first design issue: the temporal values used by the former languages are time instants equipped with an appropriate temporal ordering (which is typically a linear order over the instants), and possibly other predicates such as temporal distance. The second design issue—access to temporal values—may be resolved in two different ways, as exemplified by the following two different query languages: •Temporal Relational Calculus (TRC): a two-sorted first-order logic with variables and quantifiers explicitly ranging over the time and data domains (see the entry Temporal Relational Calculus). •First-order Temporal Logic (FOTL): a language with an implicit access to timestamps using temporal connectives (see the entry Temporal Logic in Database Query Languages). Example. The GAPS query is formulated as follows: TRC: ∃t1, t3.t1 < t2 < t3 ∧ ∃c.Employment(t1, x, c) ∧ (¬∃c.Employment(t2, x, c)) ∧ ∃c.Employment(t3, x, c) FOTL: 3∃c.Employment(x, c) ∧ (¬∃c.Employment(x, c)) ∧2∃c.Employment(x, c) Here, the explicit access to temporal values (in TRC) using the variables t1, t2, and t3 can be contrasted with the implicit access (in FOTL) using the temporal operators 3 (read “sometime in the past”) and 2 (read “sometime in the future”). The conjunction in the FOTL query represents an implicit temporal join. The formulation in