The organization of domains in proteins obeys Menzerath-Altmann’s law of language

BackgroundThe combination of domains in multidomain proteins enhances their function and structure but lengthens the molecules and increases their cost at cellular level.MethodsThe dependence of domain length on the number of domains a protein holds was surveyed for a set of 60 proteomes representing free-living organisms from all kingdoms of life. Distributions were fitted using non-linear functions and fitted parameters interpreted with a formulation of decreasing returns.ResultsWe find that domain length decreases with increasing number of domains in proteins, following the Menzerath-Altmann (MA) law of language. Highly significant negative correlations exist for the set of proteomes examined. Mathematically, the MA law expresses as a power law relationship that unfolds when molecular persistence P is a function of domain accretion. P holds two terms, one reflecting the matter-energy cost of adding domains and extending their length, the other reflecting how domain length and number impinges on information and biophysics. The pattern of diminishing returns can therefore be explained as a frustrated interplay between the strategies of economy, flexibility and robustness, matching previously observed trade-offs in the domain makeup of proteomes. Proteomes of Archaea, Fungi and to a lesser degree Plants show the largest push towards molecular economy, each at their own economic stratum. Fungi increase domain size in single domain proteins while reinforcing the pattern of diminishing returns. In contrast, Metazoa, and to lesser degrees Protista and Bacteria, relax economy. Metazoa achieves maximum flexibility and robustness by harboring compact molecules and complex domain organization, offering a new functional vocabulary for molecular biology.ConclusionsThe tendency of parts to decrease their size when systems enlarge is universal for language and music, and now for parts of macromolecules, extending the MA law to natural systems.

[1]  Peter Meyer Two semi-mathematical asides on Menzerath-Altmann's law , 2007, Exact Methods in the Study of Language and Text.

[2]  Gustavo Caetano-Anollés,et al.  Exploring the interplay of stability and function in protein evolution , 2010, BioEssays : news and reviews in molecular, cellular and developmental biology.

[3]  Patricia C Babbitt,et al.  Stability for function trade-offs in the enolase superfamily "catalytic module". , 2007, Biochemistry.

[4]  Daniel C. Harris,et al.  Nonlinear Least Squares Curve Fitting with Microsoft Excel Solver , 1998 .

[5]  Eugene V Koonin,et al.  Comparable contributions of structural-functional constraints and expression level to the rate of protein sequence evolution , 2008, Biology Direct.

[6]  Gustavo Caetano-Anollés,et al.  Evolutionary Optimization of Protein Folding , 2013, PLoS Comput. Biol..

[7]  W. F. Twaddell,et al.  Die Architektonik des deutschen Wortschatzes , 1954 .

[8]  M. Riley,et al.  Protein evolution viewed through Escherichia coli protein sequences: introducing the notion of a structural segment of homology, the module. , 1997, Journal of molecular biology.

[9]  Gustavo Caetano-Anollés,et al.  Widespread Recruitment of Ancient Domain Structures in Modern Enzymes during Metabolic Evolution , 2013, J. Integr. Bioinform..

[10]  J. Richardson,et al.  The anatomy and taxonomy of protein structure. , 1981, Advances in protein chemistry.

[11]  J. Janin,et al.  Structural domains in proteins and their role in the dynamics of protein function. , 1983, Progress in biophysics and molecular biology.

[12]  Gustavo Caetano-Anollés,et al.  Reductive evolution of proteomes and protein structures , 2011, Proceedings of the National Academy of Sciences.

[13]  D Thirumalai,et al.  Universal relations in the self-assembly of proteins and RNA , 2014, Physical biology.

[14]  Ramon Ferrer-i-Cancho,et al.  The self-organization of genomes , 2010, Complex..

[15]  Tong Zhou,et al.  Contact Density Affects Protein Evolutionary Rate from Bacteria to Animals , 2008, Journal of Molecular Evolution.

[16]  Gabriel Altmann,et al.  Exact Methods in the Study of Language and Text - Dedicated to Gabriel Altmann on the Occasion of his 75th Birthday , 2007, Exact Methods in the Study of Language and Text.

[17]  Hilla Peretz,et al.  The , 1966 .

[18]  Gustavo Caetano-Anollés,et al.  A General Framework of Persistence Strategies for Biological Systems Helps Explain Domains of Life , 2012, Front. Genet..

[19]  Cyrus Chothia,et al.  The SUPERFAMILY database in 2007: families and functions , 2006, Nucleic Acids Res..

[20]  Peter F Stadler,et al.  Solvent exposure imparts similar selective pressures across a range of yeast proteins. , 2009, Molecular biology and evolution.

[21]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[22]  Kasper P. Kepp,et al.  A Model of Proteostatic Energy Cost and Its Use in Analysis of Proteome Trends and Sequence Evolution , 2014, PloS one.

[23]  N. Srinivasan,et al.  Stability of domain structures in multi-domain proteins , 2011, Scientific reports.

[24]  Gustavo Caetano-Anollés,et al.  Annotation of Protein Domains Reveals Remarkable Conservation in the Functional Make up of Proteomes Across Superkingdoms , 2011, Genes.

[25]  Gustavo Caetano-Anollés,et al.  Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world. , 2007, Genome research.

[26]  Ramon Ferrer-i-Cancho,et al.  Random models of Menzerath-Altmann law in genomes , 2012, Biosyst..

[27]  Gustavo Caetano-Anollés,et al.  The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world. , 2009, Structure.

[28]  D. Caetano-Anollés,et al.  The origin, evolution and structure of the protein world. , 2009, The Biochemical journal.

[29]  Sertac Eroglu,et al.  Language-like behavior of protein length distribution in proteomes , 2014, Complex..

[30]  M. Ehrenberg,et al.  Costs of accuracy determined by a maximal growth rate constraint , 1984, Quarterly Reviews of Biophysics.

[31]  Melanie I. Stefan,et al.  Molecules for memory: modelling CaMKII , 2007, BMC Systems Biology.

[32]  Gustavo Caetano-Anollés,et al.  Global Patterns of Protein Domain Gain and Loss in Superkingdoms , 2014, PLoS Comput. Biol..

[33]  Changbong Hyeon,et al.  Theoretical perspectives on protein folding. , 2010, Annual review of biophysics.

[34]  Wentian Li,et al.  Menzerath's law at the gene-exon level in the human genome , 2012, Complex..

[35]  Sertac Eroglu,et al.  Self-organization of genic and intergenic sequence lengths in genomes: Statistical properties and linguistic coherence , 2015, Complex..

[36]  D. Wetlaufer Nucleation, rapid folding, and globular intrachain regions in proteins. , 1973, Proceedings of the National Academy of Sciences of the United States of America.

[37]  S. Teichmann,et al.  The folding and evolution of multidomain proteins , 2007, Nature Reviews Molecular Cell Biology.

[38]  Sertac Eroglu Menzerath–Altmann Law: Statistical Mechanical Interpretation as Applied to a Linguistic Organization , 2014 .

[39]  Charlotte M. Deane,et al.  Exploring Fold Space Preferences of New-born and Ancient Protein Superfamilies , 2013, PLoS Comput. Biol..

[40]  Ramon Ferrer-i-Cancho,et al.  The challenges of statistical patterns of language: The case of Menzerath's law in genomes , 2012, Complex..

[41]  K. Dill,et al.  Physical limits of cells and proteomes , 2011, Proceedings of the National Academy of Sciences.

[42]  Dan S. Tawfik,et al.  Diminishing returns and tradeoffs constrain the laboratory optimization of an enzyme , 2012, Nature Communications.

[43]  C. Chothia,et al.  The generation of new protein functions by the combination of domains. , 2007, Structure.

[44]  Stephen H. Bryant,et al.  Domain size distributions can predict domain boundaries , 2000, Bioinform..