High‐throughput discovery of functional disordered regions: investigation of transactivation domains

Over 40% of proteins in any eukaryotic genome encode intrinsically disordered regions (IDRs) that do not adopt defined tertiary structures. Certain IDRs perform critical functions, but discovering them is non‐trivial as the biological context determines their function. We present IDR‐Screen, a framework to discover functional IDRs in a high‐throughput manner by simultaneously assaying large numbers of DNA sequences that code for short disordered sequences. Functionality‐conferring patterns in their protein sequence are inferred through statistical learning. Using yeast HSF1 transcription factor‐based assay, we discovered IDRs that function as transactivation domains (TADs) by screening a random sequence library and a designed library consisting of variants of 13 diverse TADs. Using machine learning, we find that segments devoid of positively charged residues but with redundant short sequence patterns of negatively charged and aromatic residues are a generic feature for TAD functionality. We anticipate that investigating defined sequence libraries using IDR‐Screen for specific functions can facilitate discovering novel and functional regions of the disordered proteome as well as understand the impact of natural and disease variants in disordered segments.

[1]  Toby J. Gibson,et al.  The eukaryotic linear motif resource – 2018 update , 2017, Nucleic Acids Res..

[2]  D. Baker,et al.  Global analysis of protein folding using massively parallel design, synthesis, and testing , 2017, Science.

[3]  Norman E. Davey,et al.  Discovery of short linear motif‐mediated interactions through phage display of intrinsically disordered regions of the human proteome , 2017, The FEBS journal.

[4]  Chuan-Tien Hung,et al.  Control of the negative IRES trans-acting factor KHSRP by ubiquitination , 2016, Nucleic acids research.

[5]  David S. Goodsell,et al.  The RCSB protein data bank: integrative view of protein, gene and 3D structural information , 2016, Nucleic Acids Res..

[6]  Ben Nichols,et al.  Distributed under Creative Commons Cc-by 4.0 Vsearch: a Versatile Open Source Tool for Metagenomics , 2022 .

[7]  A. Erkine,et al.  Nucleosome distortion as a possible mechanism of transcription activation domain function , 2016, Epigenetics & Chromatin.

[8]  N. Friedman,et al.  Mapping the Landscape of a Eukaryotic Degronome. , 2016, Molecular cell.

[9]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[10]  Philip M. Kim,et al.  Pooled screening for anti-proliferative inhibitors of protein-protein interactions , 2016, Nature chemical biology.

[11]  Guohong Li,et al.  Human cytomegalovirus IE1 protein alters the higher-order chromatin structure by targeting the acidic patch of the nucleosome , 2016, eLife.

[12]  Philip M. Kim,et al.  Proteomic peptide phage display uncovers novel interactions of the PDZ1‐2 supramodule of syntenin , 2016, FEBS letters.

[13]  Alan M. Moses,et al.  Short linear motifs – ex nihilo evolution of protein regulation , 2015, Cell Communication and Signaling.

[14]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[15]  Benjamin P. Roscoe,et al.  Viewing Protein Fitness Landscapes Through a Next-Gen Lens , 2014, Genetics.

[16]  Steven Hahn,et al.  A sequence-specific transcription activator motif and powerful synthetic variants that bind Mediator using a fuzzy protein interface , 2014, Proceedings of the National Academy of Sciences.

[17]  S. Fields,et al.  Deep mutational scanning: a new style of protein science , 2014, Nature Methods.

[18]  M. Madan Babu,et al.  A million peptide motifs for the molecular biologist. , 2014, Molecular cell.

[19]  Aidan Budd,et al.  Short linear motifs: ubiquitous and functionally diverse protein interaction modules directing cell regulation. , 2014, Chemical reviews.

[20]  Christopher J. Oldfield,et al.  Classification of Intrinsically Disordered Regions and Proteins , 2014, Chemical reviews.

[21]  Tony Pawson,et al.  Large-scale interaction profiling of PDZ domains through proteomic peptide-phage display using human and viral phage peptidomes , 2014, Proceedings of the National Academy of Sciences.

[22]  Robert C. Edgar,et al.  UPARSE: highly accurate OTU sequences from microbial amplicon reads , 2013, Nature Methods.

[23]  Daniela M. Witten,et al.  An Introduction to Statistical Learning: with Applications in R , 2013 .

[24]  S. Hahn,et al.  Transcriptional Regulation in Saccharomyces cerevisiae: Transcription Factor Regulation and Function, Mechanisms of Initiation, and Roles of Activators and Coactivators , 2011, Genetics.

[25]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[26]  Peter E Wright,et al.  Structure of the p53 transactivation domain in complex with the nuclear receptor coactivator binding domain of CREB binding protein. , 2010, Biochemistry.

[27]  Dominique Douguet,et al.  HELIQUEST: a web server to screen sequences with specific alpha-helical properties , 2008, Bioinform..

[28]  Alan R. Fersht,et al.  From the first protein structures to our current knowledge of protein folding: delights and scepticisms , 2008, Nature Reviews Molecular Cell Biology.

[29]  M. Piskacek,et al.  Nine-amino-acid transactivation domain: establishment and prediction utilities. , 2007, Genomics.

[30]  A. Ansari,et al.  A TAD further: exogenous control of gene activation. , 2007, ACS chemical biology.

[31]  Gavin MacBeath,et al.  A quantitative protein interaction network for the ErbB receptors using protein microarrays , 2006, Nature.

[32]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..

[33]  Rob Kaptein,et al.  Structural properties of the promiscuous VP16 activation domain. , 2005, Biochemistry.

[34]  V. Iyer,et al.  Genome-Wide Analysis of the Biology of Stress Responses through Heat Shock Transcription Factor , 2004, Molecular and Cellular Biology.

[35]  R. Russell,et al.  Amino‐Acid Properties and Consequences of Substitutions , 2003 .

[36]  D. S. Gross,et al.  Dynamic Chromatin Alterations Triggered by Natural and Synthetic Activation Domains* , 2003, The Journal of Biological Chemistry.

[37]  Alexander Kamb,et al.  Transcriptional transactivation by selected short random peptides attached to lexA-GFP fusion proteins , 2001, BMC Molecular Biology.

[38]  M. Green,et al.  SAGA is an essential in vivo target of the yeast acidic activator Gal4p. , 2001, Genes & development.

[39]  N. Skelton,et al.  Tryptophan zippers: Stable, monomeric β-hairpins , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[40]  M. Ptashne,et al.  An artificial transcriptional activating region with unusual properties. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[41]  R. Morimoto,et al.  Regulation of the Heat Shock Transcriptional Response: Cross Talk between a Family of Heat Shock Factors, Molecular Chaperones, and Negative Regulators the Heat Shock Factor Family: Redundancy and Specialization , 2022 .

[42]  L Serrano,et al.  Elucidating the folding problem of alpha-helices: local motifs, long-range electrostatics, ionic-strength dependence and prediction of NMR parameters. , 1998, Journal of molecular biology.

[43]  Peter E Wright,et al.  Solution Structure of the KIX Domain of CBP Bound to the Transactivation Domain of CREB: A Model for Activator:Coactivator Interactions , 1997, Cell.

[44]  A. Levine,et al.  Induced α Helix in the VP16 Activation Domain upon Binding to a Human TAF , 1997 .

[45]  A. Gingras,et al.  Cocrystal Structure of the Messenger RNA 5′ Cap-Binding Protein (eIF4E) Bound to 7-methyl-GDP , 1997, Cell.

[46]  M. Ptashne,et al.  Transcriptional activation by recruitment , 1997, Nature.

[47]  A. Levine,et al.  Structure of the MDM2 Oncoprotein Bound to the p53 Tumor Suppressor Transactivation Domain , 1996, Science.

[48]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[49]  B. M. Jackson,et al.  The transcriptional activator GCN4 contains multiple activation domains that are critically dependent on hydrophobic amino acids , 1995, Molecular and cellular biology.

[50]  S. Triezenberg,et al.  Pattern of aromatic and hydrophobic amino acids critical for one of two subdomains of the VP16 transcriptional activator. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[51]  W. D. Cress,et al.  Critical structural elements of the VP16 transcriptional activation domain. , 1991, Science.

[52]  Peter K. Sorger,et al.  Yeast heat shock factor contains separable transient and sustained response transcriptional activators , 1990, Cell.

[53]  P. Sigler,et al.  Acid blobs and negative noodles , 1988, Nature.

[54]  Jun Ma,et al.  A new class of yeast transcriptional activators , 1987, Cell.

[55]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[56]  M. Babu,et al.  Illuminating the Dark Proteome , 2016, Cell.

[57]  P. Tompa,et al.  Fuzzy complexes: polymorphism and structural disorder in protein-protein interactions. , 2008, Trends in biochemical sciences.

[58]  Michael R. Barnes,et al.  Bioinformatics for geneticists : a bioinformatics primer for the analysis of genetic data , 2007 .

[59]  Robert Sabatier,et al.  IMGT standardized criteria for statistical analysis of immunoglobulin V‐REGION amino acid properties , 2004, Journal of molecular recognition : JMR.

[60]  A G Cochran,et al.  Tryptophan zippers: stable, monomeric beta -hairpins. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[61]  A. Levine,et al.  Induced alpha helix in the VP16 activation domain upon binding to a human TAF. , 1997, Science.

[62]  P B Sigler,et al.  Transcriptional activation. Acid blobs and negative noodles. , 1988, Nature.

[63]  the original work is properly cited. , 2022 .