The role of internal duplication in the evolution of multi-domain proteins

Many proteins consist of several structural domains. These multi-domain proteins have likely been generated by selective genome growth dynamics during evolution to perform new functions as well as to create structures that fold on a biologically feasible time scale. Domain units frequently evolved through a variety of genetic shuffling mechanisms. Here we examine the protein domain statistics of more than 1000 organisms including eukaryotic, archaeal and bacterial species. The analysis extends earlier findings on asymmetric statistical laws for proteome to a wider variety of species. While proteins are composed of a wide range of domains, displaying a power-law decay, the computation of domain families for each protein reveals an exponential distribution, characterizing a protein universe composed of a thin number of unique families. Structural studies in proteomics have shown that domain repeats, or internal duplicated domains, represent a small but significant fraction of genome. In spite of its importance, this observation has been largely overlooked until recently. We model the evolutionary dynamics of proteome and demonstrate that these distinct distributions are in fact rooted in an internal duplication mechanism. This process generates the contemporary protein structural domain universe, determines its reduced thickness, and tames its growth. These findings have important implications, ranging from protein interaction network modeling to evolutionary studies based on fundamental mechanisms governing genome expansion.

[1]  Francisco C. Santos,et al.  Cooperation Prevails When Individuals Adjust Their Social Ties , 2006, PLoS Comput. Biol..

[2]  David A. Liberles,et al.  The power-law distribution of gene family size is driven by the pseudogenisation rate's heterogeneity between gene families. , 2008, Gene.

[3]  S. Redner,et al.  Rate Equation Approach for Growing Networks , 2003 .

[4]  J. Raes,et al.  Modeling gene and genome duplications in eukaryotes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  E. Eichler,et al.  Recent segmental duplications in the working draft assembly of the brown Norway rat. , 2004, Genome research.

[6]  B. Kolmerer,et al.  The complete primary structure of human nebulin and its correlation to muscle structure. , 1995, Journal of molecular biology.

[7]  I. Ispolatov,et al.  Duplication-divergence model of protein interaction network. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[9]  S. Teichmann,et al.  The folding and evolution of multidomain proteins , 2007, Nature Reviews Molecular Cell Biology.

[10]  Thomas Wilhelm,et al.  Dynamic simulation of protein complex formation on a genomic scale , 2005, Bioinform..

[11]  M. Go Correlation of DNA exonic regions with protein structural units in haemoglobin , 1981, Nature.

[12]  S. Wuchty,et al.  Evolutionary cores of domain co-occurrence networks , 2005, BMC Evolutionary Biology.

[13]  Reinhard Lipowsky,et al.  Dynamic pattern evolution on scale-free networks. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[14]  D. Wetlaufer Nucleation, rapid folding, and globular intrachain regions in proteins. , 1973, Proceedings of the National Academy of Sciences of the United States of America.

[15]  M Go,et al.  Modular structural units, exons, and function in chicken lysozyme. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[16]  R. Doolittle The multiplicity of domains in proteins. , 1995, Annual review of biochemistry.

[17]  Alessandro Vespignani,et al.  Dynamical Processes on Complex Networks , 2008 .

[18]  D. K. Smith,et al.  Sequence profiles of immunoglobulin and immunoglobulin-like domains. , 1997, Journal of molecular biology.

[19]  R. Solé,et al.  Evolving protein interaction networks through gene duplication. , 2003, Journal of theoretical biology.

[20]  Mark E. J. Newman,et al.  Structure and Dynamics of Networks , 2009 .

[21]  Andreas Wagner,et al.  Neutralism and selectionism: a network-based reconciliation , 2008, Nature Reviews Genetics.

[22]  E. Koonin,et al.  The structure of the protein universe and genome evolution , 2002, Nature.

[23]  Andrew D. Moore,et al.  Arrangements in the modular evolution of proteins. , 2008, Trends in biochemical sciences.

[24]  Erik van Nimwegen,et al.  The evolution of domain-content in bacterial genomes , 2008, Biology Direct.

[25]  Siegfried Labeit,et al.  Titins: Giant Proteins in Charge of Muscle Ultrastructure and Elasticity , 1995, Science.

[26]  C Chothia,et al.  Domains in proteins: definitions, location, and structural principles. , 1985, Methods in enzymology.

[27]  Sarah A Teichmann,et al.  Relative rates of gene fusion and fission in multi-domain proteins. , 2005, Trends in genetics : TIG.

[28]  Jianzhi Zhang Evolution by gene duplication: an update , 2003 .

[29]  A. Vespignani,et al.  Modeling of Protein Interaction Networks , 2001, Complexus.

[30]  Vaishali Katju,et al.  Variation in gene duplicates with low synonymous divergence in Saccharomyces cerevisiae relative to Caenorhabditis elegans , 2009, Genome Biology.

[31]  Eugene I Shakhnovich,et al.  Expanding protein universe and its origin from the biological Big Bang , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[32]  E. Bornberg-Bauer,et al.  Domain deletions and substitutions in the modular protein evolution , 2006, The FEBS journal.

[33]  S J Winder,et al.  Nebulin, a helical actin binding protein. , 1994, The EMBO journal.

[34]  Philip M. Kim,et al.  Relating Three-Dimensional Structures to Protein Networks Provides Evolutionary Insights , 2006, Science.

[35]  M. Nowak Five Rules for the Evolution of Cooperation , 2006, Science.

[36]  Z. Gu,et al.  Evolutionary analyses of the human genome , 2001, Nature.

[37]  Eugene V. Koonin,et al.  Modeling genome evolution with a diffusion approximation of a birth-and-death process , 2005, Bioinform..

[38]  T. Ohta,et al.  On some principles governing molecular evolution. , 1974, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Pierre Brézellec,et al.  Gene fusion/fission is a major contributor to evolution of multi-domain bacterial proteins , 2006, Bioinform..

[40]  S. Redner,et al.  Connectivity of growing random networks. , 2000, Physical review letters.

[41]  E. Koonin,et al.  Birth and death of protein domains: A simple model of evolution explains power law behavior , 2002, BMC Evolutionary Biology.

[42]  D. Sankoff,et al.  Comparable rates of gene loss and functional divergence after genome duplications early in vertebrate evolution. , 1997, Genetics.

[43]  R. Doolittle Similar amino acid sequences: chance or common ancestry? , 1981, Science.

[44]  M. Kimura,et al.  The neutral theory of molecular evolution. , 1983, Scientific American.

[45]  A. Wagner How the global structure of protein interaction networks evolves , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[46]  A. Force,et al.  The probability of preservation of a newly arisen gene duplicate. , 2001, Genetics.

[47]  J. Bazan,et al.  Structural design and molecular evolution of a cytokine receptor superfamily. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Warren C. Lathe,et al.  Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. , 2000, Genome research.

[49]  Ingmar Reuter,et al.  Integr8 and Genome Reviews: integrated views of complete genomes and proteomes , 2004, Nucleic Acids Res..

[50]  S. Wuchty Scale-free behavior in protein domain networks. , 2001, Molecular biology and evolution.

[51]  Arne Elofsson,et al.  Expansion of Protein Domain Repeats , 2006, PLoS Comput. Biol..

[52]  C. B. Bridges,et al.  THE BAR "GENE" A DUPLICATION. , 1936, Science.

[53]  W. Gilbert,et al.  How big is the universe of exons? , 1990, Science.

[54]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[55]  Nikolay V Dokholyan,et al.  The architecture of the protein domain universe. , 2004, Gene.

[56]  S. Redner,et al.  Infinite-order percolation and giant fluctuations in a protein interaction network. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.