Management of Sequence Data

One of the challenges facing today's database systems is the need to support complex data types, which are of growing importance in new application areas. The thesis addresses this problem, with a speciic focus on supporting sequence data. A large part of the thesis deals with the details of sequences. Issues covered include the model for sequence data, an algebra of operators to query the data, a query language to express the queries, optimization techniques and query processing algorithms. Performance results are presented from an implementation of these ideas, demonstrating the eeects of the various optimizations. This detailed exploration of sequence data is one contribution of the thesis. The second contribution is a solution to the problem of integrating diierent data types, including sequences and relations, in a general-purpose database system. The thesis discusses the drawbacks of existing solutions, and then proposes a solution based on a novel E-ADT paradigm. This paradigm has been used in the development of the PREDATOR database system, and the implementation brings to light several advantages as well as limitations of this paradigm. The support for sequences has been implemented as a component of this larger system. The conclusion drawn from the sequence implementation is that it is important to provide specialized support for queries over sequences. By extrapolation, similar conclusions may be drawn about support for other complex data types like matrixes and images. The E-ADT paradigm provides a mechanism to integrate all these types within a single general-purpose database system. The conclusion of the thesis is that database systems should adopt some of the E-ADT ideas so as to provide more eecient support for complex data types like sequences. ii Acknowledgements

[1]  Miron Livny,et al.  The Case for Enhanced Abstract Data Types , 1997, VLDB.

[2]  Anthony C. Klug Equivalence of Relational Algebra and Relational Calculus Query Languages Having Aggregate Functions , 1982, JACM.

[3]  David B. Lomet,et al.  Access methods for multiversion data , 1989, SIGMOD '89.

[4]  Won Kim,et al.  On optimizing an SQL-like nested query , 1982, TODS.

[5]  François Bancilhon,et al.  Building an Object-Oriented Database System, The Story of O2 , 1992 .

[6]  John Grant,et al.  Optimization in Deductive and Conventional Relational Database Systems , 1979, Advances in Data Base Theory.

[7]  Michael Stonebraker,et al.  Application of Abstract Data Types and Abstract Indices to CAD Data Bases , 1986, Engineering Design Applications.

[8]  Goetz Graefe,et al.  Algebraic Optimization of Computations over Scientific Databases , 1993, IEEE Data Eng. Bull..

[9]  Michael Stonebraker,et al.  Inclusion of new types in relational data base systems , 1986, 1986 IEEE Second International Conference on Data Engineering.

[10]  Joseph M. Hellerstein,et al.  Optimization and Execution Techniques for Queries With Expensive Methods , 1996, Technical Report / University of Wisconsin, Madison / Computer Sciences Department.

[11]  Laura M. Haas,et al.  Querying Multimedia Data from Multiple Repositories by Content: the Garlic Project , 1995, VDB.

[12]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[13]  Limsoon Wong,et al.  Querying Nested Collections , 1994 .

[14]  Joel E. Richardson,et al.  Supporting Lists in a Data Model (A Timely Approach) , 1992, VLDB.

[15]  Charles LeBeau,et al.  Technical traders guide to computer analysis of the futures market , 1992 .

[16]  Ramez Elmasri,et al.  The Time Index: An Access Structure for Temporal Data , 1990, VLDB.

[17]  Giuseppe Psaila,et al.  Querying Shapes of Histories , 1995, VLDB.

[18]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[19]  Sushil Jajodia,et al.  Temporal Databases: Theory, Design, and Implementation , 1993 .

[20]  Ramez Elmasri,et al.  Efficient implementation techniques for the time index , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[21]  Seymour Ginsburg,et al.  Pattern matching by Rs-operations: towards a unified approach to querying sequenced data , 1992, PODS '92.

[22]  Stanley B. Zdonik,et al.  Control of an Extensible Query Optimizer: A Planning-Based Approach , 1993, VLDB.

[23]  Hendrik Segers,et al.  Composite event specification in active databases: model and implementation , 1992 .

[24]  Richard R. Muntz,et al.  Stream Processing: Temporal Query Processing and Optimization , 1993, Temporal Databases.

[25]  Michael D. Soo,et al.  Bibliography on temporal databases , 1991, SGMD.

[26]  Stanley B. Zdonik,et al.  The AQUA approach to querying lists and trees in object-oriented databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[27]  Richard Hull,et al.  A Survey of Theoretical Research on Typed Complex Database Objects , 1988, XP7.52 Workshop on Database Theory.

[28]  Patrick Valduriez,et al.  A FAD for Data Intensive Applications , 1992, IEEE Trans. Knowl. Data Eng..

[29]  Arie Segev,et al.  A Framework for Query Optimization in Temporal Databases , 1990, SSDBM.

[30]  David J. DeWitt,et al.  Algebraic support for complex objects with arrays, identity, and inheritance , 1991, SIGMOD '91.

[31]  Miron Livny,et al.  Sequence query processing , 1994, SIGMOD '94.

[32]  Patrick Valduriez,et al.  SVP - a Model Capturing Sets, Streams, and Parallelism , 1998 .

[33]  Sushil Jajodia,et al.  Temporal modules: an approach toward federated temporal databases , 1993, Inf. Sci..

[34]  Michael Stonebraker,et al.  The Implementation of Postgres , 1990, IEEE Trans. Knowl. Data Eng..

[35]  Angelika Kotz Dittrich,et al.  Using the CALANDA time series management system , 1995, SIGMOD '95.

[36]  Luca Cardelli,et al.  On understanding types, data abstraction, and polymorphism , 1985, CSUR.

[37]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .

[38]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[39]  Hamid Pirahesh,et al.  Cost-based optimization for magic: algebra and implementation , 1996, SIGMOD '96.

[40]  Arie Segev,et al.  Event-Join Optimization in Temporal Relational Databases , 1989, VLDB.

[41]  David Maier,et al.  A call to order , 1993, PODS '93.

[42]  Arie Segev,et al.  Managing Temporal Financial Data in an Extensible Database , 1993, VLDB.

[43]  Miron Livny,et al.  The Design and Implementation of a Sequence Database System , 1996, VLDB.

[44]  R. G. G. Cattell,et al.  The Object Database Standard: ODMG-93 , 1993 .

[45]  W. Daniel Hillis,et al.  Data parallel algorithms , 1986, CACM.

[46]  Gustavo Alonso,et al.  An Overview of the Exotica Research Project on Workflow Management Systems , 1995 .

[47]  Joel E. Richardson,et al.  Aspects: extending objects to support multiple, independent roles , 1991, SIGMOD '91.

[48]  Edward L. Robertson,et al.  A query language for list-based complex objects , 1994, PODS '94.

[49]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[50]  David J. DeWitt,et al.  Client-Server Paradise , 1994, VLDB.

[51]  Miron Livny,et al.  SEQ: A model for sequence databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[52]  Limsoon Wong,et al.  A query language for multidimensional arrays: design, implementation, and optimization techniques , 1996, SIGMOD '96.

[53]  Michael Stonebraker,et al.  Efficient organization of large multidimensional arrays , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[54]  H. Gunadhi,et al.  Query processing algorithms for temporal intersection joins , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[55]  Hamid Pirahesh,et al.  Extensible/rule based query rewrite optimization in Starburst , 1992, SIGMOD '92.

[56]  Goetz Graefe,et al.  The Volcano optimizer generator: extensibility and efficient search , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[57]  Hamid Pirahesh,et al.  Complex query decorrelation , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[58]  David J. DeWitt,et al.  The EXODUS Extensible DBMS Project: An Overview , 1989 .

[59]  Richard T. Snodgrass,et al.  Performance evaluation of a temporal database management system , 1986, SIGMOD '86.

[60]  Kyuseok Shim,et al.  Query Optimization in the Presence of Foreign Functions , 1993, VLDB.

[61]  Michael Stonebraker,et al.  QUEL as a data type , 1984, SIGMOD '84.