Searching for frameshift evolutionary relationships between protein sequence families

The protein sequence database was analyzed for evidence that some distinct sequence families might be distantly related in evolution by changes in frame of translation. Sequences were compared using special amino acid substitution matrices for the alternate frames of translation. The statistical significance of alignment scores were computed in the true database and shuffled versions of the database that preserve any potential codon bias. The comparison of results from these two databases provides a very sensitive method for detecting remote relationships. We find a weak but measurable relatedness within the database as a whole, supporting the notion that some proteins may have evolved from others through changes in frame of translation. We also quantify residual homology in the ordinary sense within a database of generally unrelated sequences. Proteins 1999;37:278–283. ©1999 Wiley‐Liss, Inc.

[1]  S. Gerondakis,et al.  Alternate RNA splicing of murine nfkb1 generates a nuclear isoform of the p50 precursor NF-kappa B1 that can function as a transactivator of NF-kappa B-regulated transcription. , 1994, Molecular and cellular biology.

[2]  J. Claverie,et al.  Detecting frame shifts by amino acid sequence comparison. , 1993, Journal of molecular biology.

[3]  C. A. Hutchison,et al.  Overlapping genes in bacteriophage φX174 , 1976, Nature.

[4]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[5]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[6]  Robert T. Sauer,et al.  Cooperatively folded proteins in random sequence libraries , 1995, Nature Structural Biology.

[7]  M. O. Dayhoff,et al.  Establishing homologies in protein sequences. , 1983, Methods in enzymology.

[8]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[9]  Martin Vingron,et al.  Sequence Comparison Significance and Poisson Approximation , 1994 .

[10]  David S. Eisenberg,et al.  Finding families for genomic ORFans , 1999, Bioinform..

[11]  E. Sonnhammer,et al.  Modular arrangement of proteins as inferred from analysis of homology , 1994, Protein science : a publication of the Protein Society.

[12]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[13]  S. B. Needleman,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 1989 .