Synonymous Constraint Elements Show a Tendency to Encode Intrinsically Disordered Protein Segments

Synonymous constraint elements (SCEs) are protein-coding genomic regions with very low synonymous mutation rates believed to carry additional, overlapping functions. Thousands of such potentially multi-functional elements were recently discovered by analyzing the levels and patterns of evolutionary conservation in human coding exons. These elements provide a good opportunity to improve our understanding of how the redundant nature of the genetic code is exploited in the cell. Our premise is that the protein segments encoded by such elements might better comply with the increased functional demands if they are structurally less constrained (i.e. intrinsically disordered). To test this idea, we investigated the protein segments encoded by SCEs with computational tools to describe the underlying structural properties. In addition to SCEs, we examined the level of disorder, secondary structure, and sequence complexity of protein regions overlapping with experimentally validated splice regulatory sites. We show that multi-functional gene regions translate into protein segments that are significantly enriched in structural disorder and compositional bias, while they are depleted in secondary structure and domain annotations compared to reference segments of similar lengths. This tendency suggests that relaxed protein structural constraints provide an advantage when accommodating multiple overlapping functions in coding regions.

[1]  Zsuzsanna Dosztányi,et al.  Bioinformatical approaches to characterize intrinsically disordered/unstructured proteins , 2010, Briefings Bioinform..

[2]  Albert J. Vilella,et al.  A high-resolution map of human evolutionary constraint using 29 mammals , 2011, Nature.

[3]  E. Birney,et al.  Comparative genomics: genome-wide analysis in metazoan eukaryotes , 2003, Nature Reviews Genetics.

[4]  Zoran Obradovic,et al.  DisProt: the Database of Disordered Proteins , 2006, Nucleic Acids Res..

[5]  David Haussler,et al.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome , 2006, PLoS Comput. Biol..

[6]  P. Tompa Intrinsically unstructured proteins. , 2002, Trends in biochemical sciences.

[7]  S. Fang,et al.  RING fingers mediate ubiquitin-conjugating enzyme (E2)-dependent ubiquitination. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Nancy F. Hansen,et al.  Comparative analyses of multi-species sequences from targeted genomic regions , 2003, Nature.

[9]  S. Elledge,et al.  BASC, a super complex of BRCA1-associated proteins involved in the recognition and repair of aberrant DNA structures. , 2000, Genes & development.

[10]  R. Krumlauf,et al.  A regulatory module embedded in the coding region of Hoxa2 controls expression in rhombomere 2 , 2008, Proceedings of the National Academy of Sciences.

[11]  Cheryl H Arrowsmith,et al.  Characterization of segments from the central region of BRCA1: an intrinsically disordered scaffold for multiple protein-protein and protein-DNA interactions? , 2005, Journal of molecular biology.

[12]  Peter Tompa,et al.  Unstructural biology coming of age. , 2011, Current opinion in structural biology.

[13]  Mattia D'Antonio,et al.  SpliceAid-F: a database of human splicing factors and their RNA-binding sites , 2012, Nucleic Acids Res..

[14]  J. Silberg,et al.  A transposase strategy for creating libraries of circularly permuted proteins , 2012, Nucleic acids research.

[15]  P. Tompa,et al.  Structural Disorder in Eukaryotes , 2012, PloS one.

[16]  H. Dyson,et al.  Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. , 1999, Journal of molecular biology.

[17]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[18]  Laurent Gil,et al.  Ensembl 2013 , 2012, Nucleic Acids Res..

[19]  Kaare Teilum,et al.  Conformational selection in the molten globule state of the nuclear coactivator binding domain of CBP , 2010, Proceedings of the National Academy of Sciences.

[20]  R. Rezsohazy,et al.  An ultraconserved Hox–Pbx responsive element resides in the coding sequence of Hoxa2 and is active in rhombomere 4 , 2008, Nucleic acids research.

[21]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.

[22]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[23]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.

[24]  Arend Sidow,et al.  Sequence First. Ask Questions Later. , 2002, Cell.

[25]  S. Eddy A Model of the Statistical Power of Comparative Genome Sequence Analysis , 2005, PLoS biology.

[26]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[27]  Zemin Zhang,et al.  Natural selection of protein structural and functional properties: a single nucleotide polymorphism perspective , 2008, Genome Biology.

[28]  F. Ruddle,et al.  An efficient cis-element discovery method using multiple sequence comparisons based on evolutionary relationships. , 2001, Genomics.

[29]  James A. Cuff,et al.  Genome sequence, comparative analysis and haplotype structure of the domestic dog , 2005, Nature.

[30]  M. Babu,et al.  The rules of disorder or why disorder rules. , 2009, Progress in biophysics and molecular biology.

[31]  B. Patterson,et al.  Letter to the editor. , 2018, Journal of professional nursing : official journal of the American Association of Colleges of Nursing.

[32]  P. Tompa The interplay between structure and function in intrinsically unstructured proteins , 2005, FEBS letters.

[33]  Jon D. McAuliffe,et al.  Phylogenetic Shadowing of Primate Sequences to Find Functional Regions of the Human Genome , 2003, Science.

[34]  David W Mount,et al.  Using the Basic Local Alignment Search Tool (BLAST). , 2007, CSH protocols.

[35]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[36]  D. Haussler,et al.  Article Identification and Characterization of Multi-Species Conserved Sequences , 2022 .

[37]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[38]  S. Batzoglou,et al.  Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes. , 2003, Genome research.

[39]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[40]  John C. Wootton,et al.  Non-globular Domains in Protein Sequences: Automated Segmentation Using Complexity Measures , 1994, Comput. Chem..

[41]  P. Tompa,et al.  The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. , 2005, Journal of molecular biology.

[42]  Manolis Kellis,et al.  Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes. , 2011, Genome research.

[43]  Seong-Ho Kim,et al.  Predicted Functional RNAs within Coding Regions Constrain Evolutionary Rates of Yeast Proteins , 2008, PloS one.

[44]  C. Brown,et al.  Intrinsic protein disorder in complete genomes. , 2000, Genome informatics. Workshop on Genome Informatics.

[45]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[46]  C. Burge,et al.  Vertebrate MicroRNA Genes , 2003, Science.

[47]  P. Tompa,et al.  Dual coding in alternative reading frames correlates with intrinsic protein disorder , 2010, Proceedings of the National Academy of Sciences.

[48]  P. Tompa Intrinsically disordered proteins: a 10-year recap. , 2012, Trends in biochemical sciences.

[49]  H. Dyson,et al.  Intrinsically unstructured proteins and their functions , 2005, Nature Reviews Molecular Cell Biology.

[50]  J. Nilsson,et al.  Proteome-wide evidence for enhanced positive Darwinian selection within intrinsically disordered regions in proteins , 2011, Genome Biology.