Semantics analysis through elementary meanings: theoretical foundation for generalized thesaurus construction

This paper develops a database query language called Transducer Datalog motivated by the needs of a new and emerging class of database applications. In these applications, such as text databases and genome databases, the storage and manipulation of long character sequences is a crucial feature. The issues involved in managing this kind of data are not addressed by traditional database systems, either in theory or in practice. To address these issues, we recently introduced a new machine model called a generalized sequence transducer. These generalized transducers extend ordinary transducers by allowing them to invoke other transducers as “subroutines.” This paper establishes the computational properties of Transducer Datalog, a query language based on this new machine model. In the process, we develop a hierarchy of time-complexity classes based on the Ackermann function. The lower levels of this hierarchy correspond to well-known complexity classes, such as polynomial time and hyper-exponential time. We establish a tight relationship between levels in this hierarchy and the depth of subroutine calls within Transducer Datalog programs. Finally, we show that Transducer Datalog programs of arbitrary depth express exactly the sequence functions computable in primitive-recursive time.

[1]  Seymour Ginsburg,et al.  Pattern matching by Rs-operations: towards a unified approach to querying sequenced data , 1992, PODS '92.

[2]  Serge Abiteboul,et al.  Object identity as a query language primitive , 1989, SIGMOD '89.

[3]  Jr. Hartley Rogers Theory of Recursive Functions and Effective Computability , 1969 .

[4]  Howard Jackson,et al.  Words and their meaning , 1988 .

[5]  David B. Searls Representing Genetic Information with Formal Grammars , 1988, AAAI.

[6]  Adrian Walker,et al.  Towards a Theory of Declarative Knowledge , 1988, Foundations of Deductive Databases and Logic Programming..

[7]  David B. Searls,et al.  The computational linguistics of biological sequences , 1993, ISMB 1995.

[8]  Padmini Srinivasan,et al.  Thesaurus Construction , 1992, Information Retrieval: Data Structures & Algorithms.

[9]  Matti Nykänen,et al.  Reasoning about strings in databases , 1994, PODS '94.

[10]  James L. McClelland,et al.  Mechanisms of Sentence Processing: Assigning Roles to Constituents of Sentences , 1986 .

[11]  François Bancilhon,et al.  A query language for the O 2 object-oriented databases , 1989 .

[12]  Nathan Goodman Research problems in genome databases , 1995, PODS '95.

[13]  Jeffrey W. Roberts,et al.  遺伝子の分子生物学 = Molecular biology of the gene , 1970 .

[14]  Gaston H. Gonnet Text dominated databases, theory practice and experience (abstract) , 1994, PODS '94.

[15]  Carsten Helgesen,et al.  PALM - A Pattern Language for Molecular Biology , 1993, ISMB.

[16]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[17]  David Jordan,et al.  The Object Database Standard: ODMG 2.0 , 1997 .

[18]  Patrick Valduriez,et al.  SVP - a Model Capturing Sets, Streams, and Parallelism , 1998 .

[19]  Tova Milo,et al.  An Algebra for Pomsets , 1995, ICDT.

[20]  John Wylie Lloyd,et al.  Foundations of Logic Programming , 1987, Symbolic Computation.

[21]  Anthony J. Bonner,et al.  Finite Query Languages for Sequence Databases , 1995, DBPL.

[22]  Gaston H. Gonnet Tutorial: Text Dominated Databases, Theory Practice and Experience. , 1994, PODS 1994.

[23]  A. Grzegorczyk Some classes of recursive functions , 1964 .

[24]  David Harel,et al.  Computable Queries for Relational Data Bases , 1980, J. Comput. Syst. Sci..

[25]  Julio Collado-Vides,et al.  The search for a grammatical theory of gene regulation is formally justified by showing the inadequacy of context-free grammars , 1991, Comput. Appl. Biosci..

[26]  Anthony J. Bonner,et al.  Querying String Databases with Transducers , 1997, DBPL.

[27]  Anthony J. Bonner,et al.  Sequences, Datalog and transducers , 1995, PODS '95.

[28]  W. Ackermann Zum Hilbertschen Aufbau der reellen Zahlen , 1928 .