The relationship between proteome size, structural disorder and organism complexity

BackgroundSequencing the genomes of the first few eukaryotes created the impression that gene number shows no correlation with organism complexity, often referred to as the G-value paradox. Several attempts have previously been made to resolve this paradox, citing multifunctionality of proteins, alternative splicing, microRNAs or non-coding DNA. As intrinsic protein disorder has been linked with complex responses to environmental stimuli and communication between cells, an additional possibility is that structural disorder may effectively increase the complexity of species.ResultsWe revisited the G-value paradox by analyzing many new proteomes whose complexity measured with their number of distinct cell types is known. We found that complexity and proteome size measured by the total number of amino acids correlate significantly and have a power function relationship. We systematically analyzed numerous other features in relation to complexity in several organisms and tissues and found: the fraction of protein structural disorder increases significantly between prokaryotes and eukaryotes but does not further increase over the course of evolution; the number of predicted binding sites in disordered regions in a proteome increases with complexity; the fraction of protein disorder, predicted binding sites, alternative splicing and protein-protein interactions all increase with the complexity of human tissues.ConclusionsWe conclude that complexity is a multi-parametric trait, determined by interaction potential, alternative splicing capacity, tissue-specific protein disorder and, above all, proteome size. The G-value paradox is only apparent when plants are grouped with metazoans, as they have a different relationship between complexity and proteome size.

[1]  A. Dunker,et al.  Disorder and sequence repeats in hub proteins and their implications for network evolution. , 2006, Journal of proteome research.

[2]  A. Fersht,et al.  Protein folding and binding: moving into unchartered territory. , 2009, Current opinion in structural biology.

[3]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[4]  Christopher J. Oldfield,et al.  The unfoldomics decade: an update on intrinsically disordered proteins , 2008, BMC Genomics.

[5]  P. Tompa,et al.  Structural disorder throws new light on moonlighting. , 2005, Trends in biochemical sciences.

[6]  J. Felsenstein Phylogenies and the Comparative Method , 1985, The American Naturalist.

[7]  Peter Tompa,et al.  Structure and Function of Intrinsically Disordered Proteins , 2009 .

[8]  R. Haygood Mutation Rate and the Cost of Complexity , 2006 .

[9]  Namshin Kim,et al.  The ASAP II database: analysis and comparative genomics of alternative splicing in 15 animal species , 2006, Nucleic Acids Res..

[10]  B. Frey,et al.  Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing , 2008, Nature Genetics.

[11]  Peter Tompa,et al.  Structural disorder promotes assembly of protein complexes , 2007, BMC Structural Biology.

[12]  S Blair Hedges,et al.  BMC Evolutionary Biology BioMed Central , 2003 .

[13]  Antje Chang,et al.  Development of a classification scheme for disease-related enzyme information , 2011, BMC Bioinformatics.

[14]  H. Dyson,et al.  Intrinsically unstructured proteins and their functions , 2005, Nature Reviews Molecular Cell Biology.

[15]  Arne Ø. Mooers,et al.  Size and complexity among multicellular organisms , 1997 .

[16]  Ncbi National Center for Biotechnology Information , 2008 .

[17]  Gene W. Yeo,et al.  Variation in alternative splicing across human tissues , 2004, Genome Biology.

[18]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..

[19]  Debasis Dash,et al.  Role of intrinsic disorder in transient interactions of hub proteins , 2006, Proteins.

[20]  P. Tompa,et al.  Reduction in Structural Disorder and Functional Complexity in the Thermal Adaptation of Prokaryotes , 2010, PloS one.

[21]  Michel Schneider,et al.  UniProtKB/Swiss-Prot. , 2007, Methods in molecular biology.

[22]  C. Brown,et al.  Intrinsic protein disorder in complete genomes. , 2000, Genome informatics. Workshop on Genome Informatics.

[23]  J. Castle,et al.  Genome-Wide Survey of Human Alternative Pre-mRNA Splicing with Exon Junction Microarrays , 2003, Science.

[24]  J. Mattick,et al.  The relationship between non-protein-coding DNA and eukaryotic complexity. , 2007, BioEssays : news and reviews in molecular, cellular and developmental biology.

[25]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[26]  H. Dyson,et al.  Linking folding and binding. , 2009, Current opinion in structural biology.

[27]  Constance Jeffery,et al.  Moonlighting proteins , 2010, Genome Biology.

[28]  P. Tompa,et al.  Prevalent structural disorder in E. coli and S. cerevisiae proteomes. , 2006, Journal of proteome research.

[29]  Lilia M. Iakoucheva,et al.  Intrinsic Disorder Is a Common Feature of Hub Proteins from Four Eukaryotic Interactomes , 2006, PLoS Comput. Biol..

[30]  Cyrus Chothia,et al.  Protein Family Expansions and Biological Complexity , 2006, PLoS Comput. Biol..

[31]  H. Hegyi,et al.  Increased structural disorder of proteins encoded on human sex chromosomes. , 2012, Molecular bioSystems.

[32]  David Penny,et al.  Proceedings of the SMBE Tri-National Young Investigators' Workshop 2005. Reconstructing the origins and dispersal of the Polynesian bottle gourd (Lagenaria siceraria). , 2006, Molecular biology and evolution.

[33]  Mark A McPeek,et al.  The phylogenetic distribution of metazoan microRNAs: insights into evolutionary complexity and constraint. , 2006, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[34]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[35]  S. Stamm,et al.  Function of Alternative Splicing , 2004 .

[36]  Tsvetomira Ivanova,et al.  Promoter-driven splicing regulation in fission yeast , 2008, Nature.

[37]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[38]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[39]  Zsuzsanna Dosztányi,et al.  Prediction of Protein Binding Regions in Disordered Proteins , 2009, PLoS Comput. Biol..

[40]  G. Wray,et al.  The g‐value paradox , 2002, Evolution & development.

[41]  A Keith Dunker,et al.  Alternative splicing in concert with protein intrinsic disorder enables increased functional diversity in multicellular organisms. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[42]  P Bork,et al.  EST comparison indicates 38% of human mRNAs contain possible alternative splice forms , 2000, FEBS letters.

[43]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[44]  Peter Tompa,et al.  Verification of alternative splicing variants based on domain integrity, truncation length and intrinsic protein disorder , 2010, Nucleic acids research.