Automatic Procedure to Extract Signature Pentapeptides from the Protein Sequence Database

A method is described for extracting signature pentapeptides that are conserved and exclusively found in a group of homologous proteins. The BLAST algorithm is used to count the frequency of occurrences of pentapeptide patterns allowing limited substitutions, as well as to perform homology search. For those pentapeptides that appear in a given sequence we examine the frequency of occurrences of these pentapeptides and related ones in homologous sequences which are ordered according to the homology score. By comparing against the frequency in the entire database, we can extract uniquely conserved pentapeptides and at the same time perform a grouping of homologous sequences. Thus, our procedure can automatically identify, if any, pentapeptides that are strongly tied with the group. Possibility of using our pentapeptide word dictionary to infer protein function is

[1]  C. Martín,et al.  The OLE1 gene of Saccharomyces cerevisiae encodes the delta 9 fatty acid desaturase and can be functionally replaced by the rat stearoyl-CoA desaturase gene. , 1990, The Journal of biological chemistry.

[2]  M Kanehisa,et al.  Construction of a dictionary of sequence motifs that characterize groups of related proteins , 1992, Protein engineering.

[3]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[4]  C Sander,et al.  On the use of sequence homologies to predict protein structure: identical pentapeptides can have completely different conformations. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[5]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[6]  A. Bairoch PROSITE: a dictionary of sites and patterns in proteins. , 1991, Nucleic acids research.

[7]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.