Study of protein sequence comparison metrics on the Connection Machine CM-2

Software tools have been developed to perform rapid large-scale protein sequence comparisons on databases of amino-acid sequences, using a data-parallel computer architecture. This software makes it possible to compare a protein against a database of several thousand proteins in the same time required by a conventional computer to do a single protein-protein comparison, thus enabling biologists to find relevant similarities much more quickly, and to evaluate many different comparison metrics in a reasonable period of time. This software was used to analyze the effectiveness of various scoring metrics in determining sequence similarity, and to generate statistical information about the behavior of these scoring systems under the variation of certain parameters. >

[1]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[2]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[3]  David S. Johnson,et al.  Near-optimal bin packing algorithms , 1973 .

[4]  Temple F. Smith,et al.  The statistical distribution of nucleic acid similarities. , 1985, Nucleic acids research.

[5]  Robert A. Wagner,et al.  Parallelization of the Dynamic Programming Algorithm for Comparison of Sequences , 1987, International Conference on Parallel Processing.

[6]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[7]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[8]  J. F. Collins,et al.  Applications of parallel processing algorithms for DNA sequence analysis , 1984, Nucleic Acids Res..

[9]  W. Daniel Hillis,et al.  The connection machine , 1985 .

[10]  Michael S. Waterman,et al.  An Erdös-Rényi law with shifts , 1985 .

[11]  W. Gilbert,et al.  A new method for sequencing DNA. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[12]  M. Waterman,et al.  Phase transitions in sequence matches and nucleic acid structure. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Michael S. Waterman,et al.  An Extreme Value Theory for Sequence Matching , 1986 .

[14]  J. F. Collins,et al.  Protein and Nucleic Acid Sequence Database Searching: A Suitable Case for Parallel processing , 1987, Comput. J..

[15]  R. Bellman Dynamic programming. , 1957, Science.

[16]  R. Doolittle Of urfs and orfs : a primer on how to analyze devised amino acid sequences , 1986 .

[17]  R F Doolittle,et al.  Simian sarcoma virus onc gene, v-sis, is derived from the gene (or genes) encoding a platelet-derived growth factor. , 1983, Science.