MoBIoS: a metric-space DBMS to support biological discovery

MoBIoS is a specialized database management system whose storage manager is based on metricspace indexing, and whose query language entails biological data types. When relational database management systems are used to support biological data, important data types are relegated to blob and unstructured text fields. Thus, even simple, but critical queries are executed by sequentially dumping the data to utilities outside the database. MoBIoS provides O(log n) physical access to diverse biological data types as well as uniform logical and syntactic access. Consequently, MoBIoS provides a framework where complex bioinformatic algorithms may be effectively expressed and executed as concise declarative SQL-like (Structured Query Language) queries.

[1]  T. Dobzhansky,et al.  Evolution, Genetics and Man , 1956 .

[2]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[3]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[4]  Z. Meral Özsoyoglu,et al.  Indexing large metric spaces for similarity search queries , 1999, TODS.

[5]  James Ze Wang,et al.  SST: an algorithm for finding near-exact sequence matches in time proportional to the logarithm of the database size , 2002, Bioinform..

[6]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[7]  Chris L. Tang,et al.  Efficiency of database search for identification of mutated and modified proteins via mass spectrometry. , 2001, Genome research.

[8]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[9]  Z. Meral Özsoyoglu,et al.  Distance based indexing for string proximity search , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[10]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[11]  Peter R. Baker,et al.  Role of accurate mass measurement (+/- 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. , 1999, Analytical chemistry.

[12]  Joshua J. Bloch Effective Java : programming language guide , 2001 .

[13]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[14]  Eugene W. Myers,et al.  A sublinear algorithm for approximate keyword searching , 1994, Algorithmica.

[15]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[16]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[17]  Daniel P. Miranker,et al.  An assessment of a metric space database index to support sequence homology , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[18]  Daniel P. Miranker,et al.  Clustering Sequences in a Metric Space The MoBIoS Project , 2002 .

[19]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[20]  E M Marcotte Measuring the dynamics of the proteome. , 2001, Genome research.

[21]  Philippe Fouquart,et al.  ASN.1 Communication Between Heterogeneous Systems , 2000 .

[22]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences: Multiple String Comparison – The Holy Grail , 1997 .

[23]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[24]  Pavel A. Pevzner,et al.  Mutation-tolerant protein identification by mass-spectrometry , 2000, RECOMB '00.

[25]  Pavel A. Pevzner,et al.  Mutation-Tolerant Protein Identification by Mass Spectrometry , 2000, J. Comput. Biol..

[26]  Jeffrey F. Naughton,et al.  Generalized Search Trees for Database Systems , 1995, VLDB.

[27]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.