Polymorphism Analysis Reveals Reduced Negative Selection and Elevated Rate of Insertions and Deletions in Intrinsically Disordered Protein Regions

Intrinsically disordered protein regions are abundant in eukaryotic proteins and lack stable tertiary structures and enzymatic functions. Previous studies of disordered region evolution based on interspecific alignments have revealed an increased propensity for indels and rapid rates of amino acid substitution. How disordered regions are maintained at high abundance in the proteome and across taxa, despite apparently weak evolutionary constraints, remains unclear. Here, we use single nucleotide and indel polymorphism data in yeast and human populations to survey the population variation within disordered regions. First, we show that single nucleotide polymorphisms in disordered regions are under weaker negative selection compared with more structured protein regions and have a higher proportion of neutral non-synonymous sites. We also confirm previous findings that nonframeshifting indels are much more abundant in disordered regions relative to structured regions. We find that the rate of nonframeshifting indel polymorphism in intrinsically disordered regions resembles that of noncoding DNA and pseudogenes, and that large indels segregate in disordered regions in the human population. Our survey of polymorphism confirms patterns of evolution in disordered regions inferred based on longer evolutionary comparisons.

[1]  H. Dyson,et al.  Intrinsically unstructured proteins and their functions , 2005, Nature Reviews Molecular Cell Biology.

[2]  L. Iakoucheva,et al.  The importance of intrinsic disorder for protein phosphorylation. , 2004, Nucleic acids research.

[3]  M. Nei,et al.  Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. , 1986, Molecular biology and evolution.

[4]  Raj Kumar,et al.  Role of intrinsically disordered protein regions/domains in transcriptional regulation. , 2009, Life sciences.

[5]  Robert P. Davey,et al.  Population genomics of domestic and wild yeasts , 2008, Nature.

[6]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[7]  Vishwajeeth R Pagala,et al.  Proteomic studies of the intrinsically unstructured mammalian proteome. , 2006, Journal of proteome research.

[8]  A Keith Dunker,et al.  Short Linear Motifs recognized by SH2, SH3 and Ser/Thr Kinase domains are conserved in disordered protein regions , 2008, BMC Genomics.

[9]  Robert W. Williams,et al.  Distribution and cluster analysis of predicted intrinsically disordered protein Pfam domains , 2013, Intrinsically disordered proteins.

[10]  TaeHyung Kim,et al.  Distinct Types of Disorder in the Human Proteome: Functional Implications for Alternative Splicing , 2013, PLoS Comput. Biol..

[11]  David G. Knowles,et al.  The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression , 2012, Genome research.

[12]  L. Iakoucheva,et al.  Intrinsic disorder in cell-signaling and cancer-associated proteins. , 2002, Journal of molecular biology.

[13]  A Keith Dunker,et al.  Conservation of intrinsic disorder in protein domains and families: II. functions of conserved disorder. , 2006, Journal of proteome research.

[14]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[15]  Rachel E. Klevit,et al.  A folding transition and novel zinc finger accessory domain in the transcription factor ADR1 , 1999, Nature Structural Biology.

[16]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[17]  P. Romero,et al.  Conservation of intrinsic disorder in protein domains and families: I. A database of conserved predicted disordered regions. , 2006, Journal of Proteome Research.

[18]  Gary D Bader,et al.  Bringing order to protein disorder through comparative genomics and genetic interactions , 2011, Genome Biology.

[19]  C. von Mering,et al.  PaxDb, a Database of Protein Abundance Averages Across All Three Domains of Life , 2012, Molecular & Cellular Proteomics.

[20]  Ziheng Yang,et al.  INDELible: A Flexible Simulator of Biological Sequence Evolution , 2009, Molecular biology and evolution.

[21]  Dan S. Tawfik,et al.  Protein insertions and deletions enabled by neutral roaming in sequence space. , 2013, Molecular biology and evolution.

[22]  C. Dobson,et al.  The amyloid state and its association with protein misfolding diseases , 2014, Nature Reviews Molecular Cell Biology.

[23]  K. Nakai,et al.  Chemical composition is maintained in poorly conserved intrinsically disordered regions and suggests a means for their classification. , 2012, Molecular bioSystems.

[24]  E T Young,et al.  An accessory DNA binding motif in the zinc finger protein Adr1 assists stable binding to DNA and can be replaced by a third finger. , 2000, Biochemistry.

[25]  David T. Jones,et al.  DISOPRED3: precise disordered region predictions with annotated protein-binding activity , 2014, Bioinform..

[26]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[27]  Philipp W Messer,et al.  DNA indels in coding regions reveal selective constraints on protein evolution in the human lineage , 2007, BMC Evolutionary Biology.

[28]  P. Keightley,et al.  Joint Inference of the Distribution of Fitness Effects of Deleterious Mutations and Population Demography Based on Nucleotide Polymorphism Frequencies , 2007, Genetics.

[29]  Alan M. Moses,et al.  Proteome-Wide Discovery of Evolutionary Conserved Sequences in Disordered Regions , 2012, Science Signaling.

[30]  P. Keightley,et al.  Estimating the Rate of Adaptive Molecular Evolution When the Evolutionary Divergence Between Species is Small , 2012, Journal of Molecular Evolution.

[31]  Wilfried Haerty,et al.  Mutations within lncRNAs are effectively selected against in fruitfly but not in human , 2013, Genome Biology.

[32]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[33]  Leopold Parts,et al.  A High-Definition View of Functional Genetic Variation from Natural Yeast Genomes , 2014, Molecular biology and evolution.

[34]  A. Elofsson,et al.  Long indels are disordered: a study of disorder and indels in homologous eukaryotic proteins. , 2013, Biochimica et biophysica acta.

[35]  V. Uversky Intrinsically Disordered Proteins , 2014 .

[36]  C. Wilke,et al.  Why highly expressed proteins evolve slowly. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[37]  David A. Case,et al.  Structural basis for DNA bending by the architectural transcription factor LEF-1 , 1995, Nature.

[38]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[39]  Christopher J. Oldfield,et al.  Evolutionary Rate Heterogeneity in Proteins with Long Disordered Regions , 2002, Journal of Molecular Evolution.

[40]  Avner Schlessinger,et al.  Large-scale analysis of thermostable, mammalian proteins provides insights into the intrinsically disordered proteome. , 2009, Journal of proteome research.

[41]  Gary W. Daughdrill,et al.  Dynamic Behavior of an Intrinsically Unstructured Linker Domain Is Conserved in the Face of Negligible Amino Acid Sequence Conservation , 2007, Journal of Molecular Evolution.

[42]  Obradovic,et al.  The Sequence Attribute Method for Determining Relationships Between Sequence and Protein Disorder. , 1998, Genome informatics. Workshop on Genome Informatics.

[43]  Christopher J. Oldfield,et al.  Intrinsically disordered proteins in human diseases: introducing the D2 concept. , 2008, Annual review of biophysics.

[44]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[45]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[46]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[47]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[48]  Edith D. Wong,et al.  Saccharomyces Genome Database: the genomics resource of budding yeast , 2011, Nucleic Acids Res..

[49]  R. Nielsen Molecular signatures of natural selection. , 2005, Annual review of genetics.

[50]  Debasis Dash,et al.  Intrinsic unstructuredness and abundance of PEST motifs in eukaryotic proteomes , 2005, Proteins.

[51]  Maria Anisimova,et al.  Markov Models of Amino Acid Substitution to Study Proteins with Intrinsically Disordered Regions , 2011, PloS one.

[52]  Weibo Liang,et al.  Association between IRF-5 polymorphisms and risk of acute coronary syndrome. , 2010, DNA and cell biology.

[53]  Laurent Gil,et al.  Ensembl 2013 , 2012, Nucleic Acids Res..

[54]  W. J. Dickinson,et al.  A genome-wide view of the spectrum of spontaneous mutations in yeast , 2008, Proceedings of the National Academy of Sciences.

[55]  H. Dyson,et al.  Coupling of folding and binding for unstructured proteins. , 2002, Current opinion in structural biology.

[56]  Jessica W. Chen Conversation of Intrinsic Disorder in Protein Domains and Families , 2005 .

[57]  J. Nilsson,et al.  Proteome-wide evidence for enhanced positive Darwinian selection within intrinsically disordered regions in proteins , 2011, Genome Biology.

[58]  Jian-Rong Yang,et al.  Protein misinteraction avoidance causes highly expressed proteins to evolve slowly , 2012, Proceedings of the National Academy of Sciences.

[59]  Christopher J. Oldfield,et al.  Intrinsically disordered protein. , 2001, Journal of molecular graphics & modelling.

[60]  A. Elofsson,et al.  Protein expansion is primarily due to indels in intrinsically disordered regions. , 2013, Molecular biology and evolution.

[61]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[62]  István Simon,et al.  Malleable Machines in Transcription Regulation: The Mediator Complex , 2008, PLoS Comput. Biol..

[63]  Daniel L. Halligan,et al.  Evidence for Pervasive Adaptive Protein Evolution in Wild Mice , 2010, PLoS genetics.

[64]  V. Uversky,et al.  Why are “natively unfolded” proteins unstructured under physiologic conditions? , 2000, Proteins.

[65]  J. Plotkin,et al.  The Population Genetics of dN/dS , 2008, PLoS genetics.

[66]  K. Kidd,et al.  Motoneuron-specific NR3B gene , 2008, Neurology.

[67]  Jue Ruan,et al.  The evolution of small insertions and deletions in the coding genes of Drosophila melanogaster. , 2013, Molecular biology and evolution.

[68]  V. Uversky The alphabet of intrinsic disorder , 2013, Intrinsically disordered proteins.