Reasoning about strings in databases

In order to enable the database programmer to reason about relations over strings of arbitrary length we introduce alignment logic, a modal extension of relational calculus. In addition to relations, a state in the model consists of a two-dimensional array where the strings are aligned on top of each other. The basic modality in the language (a transpose, or “slide”) allows for a rearrangement of the alignment, and more complex formulas can be formed using a syntax reminiscent of regular expressions, in addition to the usual connectives and quantifiers. It turns out that the computational counterpart of the string-based portion of the logic is the class of multitape two-way finite state automata, which are devices particularly well suited for the implementation of string matching. A computational counterpart of the full logic is obtained from relational algebra by extending the selection operator into filters based on these multitape machines. Safety of formulas in alignment logic implies that new strings generated from old ones have to be of bounded length. While an undecidable property in general, this boundedness is decidable for an important subclass of formulas. As far as expressive power is concerned, alignment logic includes previous proposals for querying string databases, and gives full Turing computability. The language can be restricted to define exactly regular sets and sets in the polynomial hierarchy.

[1]  E. Allen Emerson,et al.  Temporal and Modal Logic , 1991, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[2]  David B. Searls,et al.  String Variable Grammar: A Logic Grammar Formalism for the Biological Language of DNA , 1995, J. Log. Program..

[3]  Richard Hull,et al.  Safety and translation of calculus queries with scalar functions , 1993, PODS.

[4]  Anthony J. Bonner,et al.  Sequences, Datalog and transducers , 1995, PODS '95.

[5]  Moshe Y. Vardi The complexity of relational query languages (Extended Abstract) , 1982, STOC '82.

[6]  Jeffrey D. Ullman,et al.  Principles of database and knowledge-base systems, Vol. I , 1988 .

[7]  Julio Collado-Vides,et al.  The search for a grammatical theory of gene regulation is formally justified by showing the inadequacy of context-free grammars , 1991, Comput. Appl. Biosci..

[8]  Eljas Soisalon-Soininen,et al.  Parsing Theory - Volume I: Languages and Parsing , 1988, EATCS Monographs on Theoretical Computer Science.

[9]  Jeffrey D. Ullman,et al.  Principles Of Database And Knowledge-Base Systems , 1979 .

[10]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[11]  Carsten Helgesen,et al.  PALM - A Pattern Language for Molecular Biology , 1993, ISMB.

[12]  Joel E. Richardson,et al.  Supporting Lists in a Data Model (A Timely Approach) , 1992, VLDB.

[13]  H. McAdams,et al.  Circuit simulation of genetic networks. , 1995, Science.

[14]  Alberto O. Mendelzon,et al.  Similarity-based queries , 1995, PODS '95.

[15]  Robert R. Goldberg Finite State Automata from Regular Expression Trees , 1993, Comput. J..

[16]  Seymour Ginsburg,et al.  Mappings of languages by two-tape devices , 1964, JACM.

[17]  Pierre Wolper Temporal Logic Can Be More Expressive , 1983, Inf. Control..

[18]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[19]  P Pistor,et al.  A database language for sets, lists and tables , 1986, Inf. Syst..

[20]  Abraham Silberschatz,et al.  Safety of recursive Horn clauses with infinite relations , 1987, PODS '87.

[21]  Seymour Ginsburg,et al.  Pattern matching by Rs-operations: towards a unified approach to querying sequenced data , 1992, PODS '92.

[22]  Jr. Hartley Rogers Theory of Recursive Functions and Effective Computability , 1969 .

[23]  Zvi Galil,et al.  Time-Space-Optimal String Matching , 1983, J. Comput. Syst. Sci..

[24]  Larry J. Stockmeyer,et al.  The Polynomial-Time Hierarchy , 1976, Theor. Comput. Sci..