Access the most recent version at doi: 10.1110/ps.03197403 References

Recently we proposed a novel method of alignment–alignment comparison, COMPASS (the tool for COmparison of Multiple Protein Alignments with Assessment of Statistical Significance). Here we present several examples of the relations between PFAM protein families that were detected by COMPASS and that lead to the predictions of presently unresolved protein structures. We discuss relatively straightforward COMPASS predictions that are new and interesting to us, and that would require a substantial time and effort to justify even for a skilled PSI‐BLAST user. All of the presented COMPASS hits are independently confirmed by other methods, including the ab initio structure‐prediction method ROSETTA. The tertiary structure predictions made by ROSETTA proved to be useful for improving sequence‐derived alignments, because they are based on a reasonable folding of the polypeptide chain rather than on the information from sequence databases. The ability of COMPASS to predict new relations within the PFAM database indicates the high sensitivity of COMPASS searches and substantiates its potential value for the discovery of previously unknown similarities between protein families.

[1]  D. Barford,et al.  Topological characteristics of helical repeat proteins. , 1999, Current opinion in structural biology.

[2]  P. Kraulis A program to produce both detailed and schematic plots of protein structures , 1991 .

[3]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[4]  C. Ponting,et al.  Protein repeats: structures, functions, and evolution. , 2001, Journal of structural biology.

[5]  D. Bass,et al.  Proteolytic Processing of the Astrovirus Capsid , 2000, Journal of Virology.

[6]  Golan Yona,et al.  Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. , 2002, Journal of molecular biology.

[7]  J. Fetrow,et al.  Function driven protein evolution. A possible proto-protein for the RNA-binding proteins. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[8]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[9]  D. Baker,et al.  Prospects for ab initio protein structural genomics. , 2001, Journal of molecular biology.

[10]  Baldomero Oliva,et al.  Structural similarity to link sequence space: New potential superfamilies and implications for structural genomics , 2002, Protein science : a publication of the Protein Society.

[11]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[12]  B. Rost,et al.  Alignments grow, secondary structure prediction improves , 2002, Proteins.

[13]  Richard Bonneau,et al.  Rosetta in CASP4: Progress in ab initio protein structure prediction , 2001, Proteins.

[14]  E. Sitbon,et al.  Consistency analysis of similarity between multiple alignments: prediction of protein function and fold structure from analysis of local sequence motifs. , 2001, Journal of molecular biology.

[15]  Osamu Gotoh,et al.  Further improvement in methods of group-to-group sequence alignment with generalized profile operations , 1994, Comput. Appl. Biosci..

[16]  K Karplus,et al.  Predicting protein structure using only sequence information , 1999, Proteins.

[17]  Sean R. Eddy,et al.  Pfam: multiple sequence alignments and HMM-profiles of protein domains , 1998, Nucleic Acids Res..

[18]  Shashi B. Pandit,et al.  SUPFAM - a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes , 2002, Nucleic Acids Res..

[19]  Leszek Rychlewski,et al.  Improving the quality of twilight‐zone alignments , 2000, Protein science : a publication of the Protein Society.

[20]  Jiye Shi,et al.  HOMSTRAD: adding sequence information to structure-based alignments of homologous protein families , 2001, Bioinform..

[21]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[22]  E. Koonin,et al.  Phylogeny of capsid proteins of small icosahedral RNA plant viruses. , 1991, The Journal of general virology.

[23]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[24]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[25]  G. Blatch,et al.  The tetratricopeptide repeat: a structural motif mediating protein-protein interactions. , 1999, BioEssays : news and reviews in molecular, cellular and developmental biology.

[26]  Michael Y. Galperin,et al.  The COG database: new developments in phylogenetic classification of proteins from complete genomes , 2001, Nucleic Acids Res..

[27]  N. Grishin,et al.  COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. , 2003, Journal of molecular biology.

[28]  J Schultz,et al.  SMART, a simple modular architecture research tool: identification of signaling domains. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[29]  D Fischer,et al.  Hybrid fold recognition: combining sequence derived properties with evolutionary information. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[30]  Alejandro A. Schäffer,et al.  IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices , 1999, Bioinform..

[31]  Osamu Gotoh,et al.  Optimal alignment between groups of sequences and its application to multiple sequence alignment , 1993, Comput. Appl. Biosci..

[32]  I. Small,et al.  The PPR motif - a TPR-related motif prevalent in plant organellar proteins. , 2000, Trends in biochemical sciences.

[33]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[34]  C P Ponting,et al.  Sialidase‐like Asp‐boxes: Sequence‐similar structures within different protein folds , 2001, Protein science : a publication of the Protein Society.

[35]  Nathan Linial,et al.  ProtoMap: automatic classification of protein sequences and hierarchy of protein families , 2000, Nucleic Acids Res..

[36]  S. Pietrokovski Searching databases of conserved sequence regions by aligning protein multiple-alignments. , 1996, Nucleic acids research.

[37]  Darren A. Natale,et al.  The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[38]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[39]  Robert D. Finn,et al.  Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins , 1999, Nucleic Acids Res..

[40]  C. Ponting,et al.  On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world? , 2001, Journal of structural biology.

[41]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.