Searching the protein sequence database.

As the volume of protein sequence data grows, rapid methods for searching the protein sequence database become of primary importance. Rigorous comparison of sequences is obtained with the well-known dynamic programming algorithms. However, these algorithms are not rapid enough to use for routinely searching the entire database. In this paper we discuss some methods that can be used for rapid searches.

[1]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[2]  Hans Söderlund,et al.  SEQAID: a DNA sequence assembling program based on a mathematical model , 1984, Nucleic Acids Res..

[3]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[4]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[5]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[6]  Patrick A. V. Hall,et al.  Approximate String Matching , 1994, Encyclopedia of Algorithms.

[7]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[8]  J. Kruskal An Overview of Sequence Comparison: Time Warps, String Edits, and Macromolecules , 1983 .

[9]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[10]  Robert A. Wagner,et al.  An Extension of the String-to-String Correction Problem , 1975, JACM.

[11]  T. Smith,et al.  Optimal sequence alignments. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[12]  D. Lipman,et al.  Rapid similarity searches of nucleic acid and protein data banks. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[13]  P. Sellers On the Theory and Computation of Evolutionary Distances , 1974 .

[14]  W. A. Beyer,et al.  Some Biological Sequence Metrics , 1976 .

[15]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.