An object-oriented genetics information system

Sequence data is being produced by genomic sequencing laboratories at ever-increasing rates, making it impossible for individual researchers to keep track of all the new data that might afkt their research. Computer systems are needed so that researchers can access this data. The systems must support high-level interfaces that communicate in the language of the researchers, database systems that guarantee availability and consistency of the data, and powerful search systems that rapidly scan for similarities between sequences. We have developed a prototype system that includes a graphical user interface, an object-oriented database management system, and high-performance similarity search algorithms. The prototype has the potential to increase researchers’ productivity by automating ermy of amotated sequence fragments as they are produced by sequencing machines, storing the fragmenta in the database, and automatically prcducing and displaying similarity search results of new sequences against the large public sequence datsbsses GenBank and PIR. This paper describes the prototype, discusses the kme!its of object-oriented databases for complex and changing sequence da@ and presents an object-oriented schema for genetic information. Graphical tools for annotating sequences, storing them in the database, automating similarity searches, and viewing similarity search results are presented. A new suffix tieebased data stnscture that supports rapid similarity searches on sequence data is introduced. Finally, future plans for the system are discussed.

[1]  Won Kim,et al.  Features of the ORION Object-Oriented Database System , 1989, Object-Oriented Concepts, Databases, and Applications.

[2]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[3]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[4]  G. Anthony Gorry,et al.  The virtual notebook system™: An architecture for collaborative work , 1991 .

[5]  Douglas K. Barry,et al.  Dynamic self-configuring methods for graphical presentation of ODBMs objects , 1992, [1992] Eighth International Conference on Data Engineering.

[6]  Andrew M. Burger,et al.  The virtual notebook system , 1991, HYPERTEXT '91.

[7]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Won Kim,et al.  Architecture of the ORION Next-Generation Database System , 1990, IEEE Trans. Knowl. Data Eng..

[9]  W. Pearson Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[10]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.

[11]  Won Kim,et al.  Introduction to Object-Oriented Databases , 1991, Computer systems.

[12]  John G. Hughes,et al.  Object-oriented databases , 1991, Prentice Hall International series in computer science.

[13]  Chris A. Fields,et al.  gm: a practical tool for automating DNA sequence analysis , 1990, Comput. Appl. Biosci..

[14]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[15]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[16]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.