How old is your fold?

MOTIVATION At present there exists no age estimate for the different protein structures found in nature. It has become clear from occurrence studies that different folds arose at different points in evolutionary time. An estimation of the age of different folds would be a starting point for many investigations into protein structure evolution: how we arrived at the set of folds we see today. It would also be a powerful tool in protein structure classification allowing us to reassess the available hierarchical methods and perhaps suggest improvements. RESULTS We have created the first relative age estimation technique for protein folds. Our method is based on constructing parsimonious scenarios, which can describe occurrence patterns in a phylogeny of species. The ages presented are shown to be robust to the different trees or data types used for their generation. They show correlations with other previously used protein age estimators, but appear to be far more discriminating than any previously suggested technique. The age estimates given are not absolutes but they already offer intriguing insights, like the very different age patterns of alpha/beta folds compared with small folds. The alpha/beta folds appear on average to be far older than their small fold counterparts. AVAILABILITY Example trees and additional material are available at http://www.stats.ox.ac.uk/~abeln/foldage SUPPLEMENTARY INFORMATION http://www.stats.ox.ac.uk/~abeln/foldage.

[1]  M. Gerstein,et al.  Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model. , 2001, Journal of molecular biology.

[2]  Sanne Abeln,et al.  Fold usage on genomes and protein fold evolution , 2005, Proteins.

[3]  William R Taylor,et al.  A Fourier analysis of symmetry in protein structure. , 2002, Protein engineering.

[4]  R. Doolittle,et al.  Phylogeny determined by protein domain content. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[6]  A. Murzin How far divergent evolution goes in proteins. , 1998, Current opinion in structural biology.

[7]  Ke Fan,et al.  The number of protein folds and their distribution over families in nature , 2004, Proteins.

[8]  M. Gerstein,et al.  Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. , 2000, Genome research.

[9]  Michael Y. Galperin,et al.  Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes , 2003, BMC Evolutionary Biology.

[10]  E. Levanon,et al.  Preferential attachment in the protein network evolution. , 2003, Physical review letters.

[11]  Gustavo Caetano-Anollés,et al.  An evolutionarily structured universe of protein architecture. , 2003, Genome research.

[12]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[13]  Daniel W. A. Buchan,et al.  Evolution of protein superfamilies and bacterial genome size. , 2004, Journal of molecular biology.

[14]  Frances M. G. Pearl,et al.  Quantifying the similarities within fold space. , 2002, Journal of molecular biology.

[15]  Artem Cherkasov,et al.  Structural characterization of genomes by large scale sequence-structure threading , 2003, BMC Bioinformatics.

[16]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[17]  E. Koonin,et al.  The structure of the protein universe and genome evolution , 2002, Nature.

[18]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[19]  Mark Gerstein,et al.  Structural genomics analysis: Characteristics of atypical, common, and horizontally transferred folds , 2002, Proteins.

[20]  B. Snel,et al.  Genomes in flux: the evolution of archaeal and proteobacterial gene content. , 2002, Genome research.

[21]  C. Chothia Proteins. One thousand families for the molecular biologist. , 1992, Nature.

[22]  E. Koonin,et al.  Birth and death of protein domains: A simple model of evolution explains power law behavior , 2002, BMC Evolutionary Biology.

[23]  John Moult,et al.  A unifold, mesofold, and superfold model of protein fold use , 2002, Proteins.

[24]  Cyrus Chothia,et al.  SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments , 2002, Nucleic Acids Res..

[25]  S E Brenner,et al.  Distribution of protein folds in the three superkingdoms of life. , 1999, Genome research.

[26]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..