Direct Prediction of Intrinsically Disordered Protein Conformational Properties From Sequence

Intrinsically disordered regions (IDRs) are ubiquitous across all domains of life and play a range of functional roles. While folded domains are generally well-described by a single 3D structure, IDRs exist in a collection of interconverting states known as an ensemble. This structural heterogeneity means IDRs are largely absent from the PDB, contributing to a lack of computational approaches to predict ensemble conformational properties from sequence. Here we combine rational sequence design, large-scale molecular simulations, and deep learning to develop ALBATROSS, a deep learning model for predicting IDR ensemble dimensions from sequence. ALBATROSS enables the instantaneous prediction of ensemble average properties at proteome-wide scale. ALBATROSS is lightweight, easy-to-use, and accessible as both a locally installable software package and a point-and-click style interface in the cloud. We first demonstrate the applicability of our predictors by examining the generalizability of sequence-ensemble relationships in IDRs. Then, we leverage the high-throughput nature of ALBATROSS to characterize emergent biophysical behavior of IDRs within and between proteomes. Update from previous version This preprint reports an updated version of the ALBATROSS network weights trained on simulations of over 42,000 sequences. In addition, we provide new colab notebooks that enable proteome-wide IDR prediction and annotation in minutes. All conclusions and observations made in versions 1 and 2 of this manuscript remain true and robust.

[1]  K. E. Johansson,et al.  Conformational ensembles of the human intrinsically disordered proteome: Bridging chain compaction with function and sequence conservation , 2023, bioRxiv.

[2]  P. Tiwary,et al.  AlphaFold2-RAVE: From Sequence to Boltzmann Ranking. , 2023, Journal of chemical theory and computation.

[3]  J. Straub,et al.  Sizes, conformational fluctuations, and SAXS profiles for Intrinsically Disordered Proteins , 2024, bioRxiv.

[4]  R. Pappu,et al.  Sequence-encoded grammars determine material properties and physical aging of protein condensates , 2023 .

[5]  Garrett M. Ginell,et al.  The analytical Flory random coil is a simple-to-use reference model for unfolded and disordered proteins , 2023, bioRxiv.

[6]  R. Pappu,et al.  SOURSOP: A Python package for the analysis of simulations of intrinsically disordered proteins , 2023, bioRxiv.

[7]  K. Hall,et al.  The disordered N-terminal tail of SARS-CoV-2 Nucleocapsid protein forms a dynamic complex with RNA , 2023, bioRxiv.

[8]  Ahmed Elnaggar,et al.  Ankh ☥: Optimized Protein Language Model Unlocks General-Purpose Modelling , 2023, bioRxiv.

[9]  Zeming Lin,et al.  Evolutionary-scale prediction of atomic level protein structure with a language model , 2022, bioRxiv.

[10]  G. Bowman,et al.  Accelerating Cryptic Pocket Discovery Using AlphaFold , 2022, bioRxiv.

[11]  Garrett M. Ginell,et al.  SHEPHARD: a modular and extensible software architecture for analyzing and annotating large protein datasets , 2022, bioRxiv.

[12]  E. Eichler,et al.  GIGYF1 disruption associates with autism and impaired IGF-1R signaling , 2022, The Journal of clinical investigation.

[13]  K. Lindorff-Larsen,et al.  Improved predictions of phase behaviour of intrinsically disordered proteins by tuning the interaction range , 2022, bioRxiv.

[14]  M. Feig,et al.  Direct generation of protein conformational ensembles via machine learning , 2022, bioRxiv.

[15]  S. Woodson,et al.  Intrinsically disordered interaction network in an RNA chaperone revealed by native mass spectrometry , 2022, bioRxiv.

[16]  A. Holehouse,et al.  Structural biases in disordered proteins are prevalent in the cell , 2021, bioRxiv.

[17]  Andrea Soranno,et al.  Macromolecular crowding and intrinsically disordered proteins: a polymer physics perspective. , 2022, ChemSystemsChem.

[18]  A. Schmid,et al.  Sequence- and chemical specificity define the functional landscape of intrinsically disordered regions , 2022, bioRxiv.

[19]  K. Lindorff-Larsen,et al.  Conformational ensembles of intrinsically disordered proteins and flexible multidomain proteins. , 2021, Biochemical Society transactions.

[20]  Alexander E. Lopez,et al.  Gene-level analysis of rare variants in 379,066 whole exome sequences identifies an association of GIGYF1 loss of function with type 2 diabetes , 2021, Scientific Reports.

[21]  Jerelle A. Joseph,et al.  Physics-driven coarse-grained model for biomolecular phase separation with near-quantitative accuracy , 2021, Nature Computational Science.

[22]  K. Lindorff-Larsen,et al.  Accurate model of liquid–liquid phase behavior of intrinsically disordered proteins from optimization of single-chain properties , 2021, Proceedings of the National Academy of Sciences.

[23]  Jerelle A. Joseph,et al.  RNA length has a non-trivial effect in the stability of biomolecular condensates formed by RNA-binding proteins , 2021, bioRxiv.

[24]  A. Holehouse,et al.  PARROT is a flexible recurrent neural network framework for analysis of large protein datasets , 2021, eLife.

[25]  Steven J. Plimpton,et al.  LAMMPS - A flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales , 2021, Computer Physics Communications.

[26]  Po-Ru Loh,et al.  GIGYF1 loss of function is associated with clonal mosaicism and adverse metabolic health , 2021, Nature Communications.

[27]  A. Holehouse,et al.  metapredict: a fast, accurate, and easy-to-use predictor of consensus disorder and structure , 2021, bioRxiv.

[28]  H. Chan,et al.  Small-Angle X-ray Scattering Signatures of Conformational Heterogeneity and Homogeneity of Disordered Protein Ensembles. , 2021, The journal of physical chemistry. B.

[29]  A. Holehouse,et al.  PARROT: a flexible recurrent neural network framework for analysis of large protein datasets , 2021, bioRxiv.

[30]  R. Pappu,et al.  Conformational buffering underlies functional selection in intrinsically disordered protein regions , 2021, Nature Structural & Molecular Biology.

[31]  Young C. Kim,et al.  Improved coarse‐grained model for studying sequence dependent phase separation of disordered proteins , 2021, Protein science : a publication of the Protein Society.

[32]  R. Pappu,et al.  Deciphering how naturally occurring sequence features impact the phase behaviors of disordered prion-like domains , 2021, bioRxiv.

[33]  Silvio C. E. Tosatto,et al.  PED in 2021: a major update of the protein ensemble database for intrinsically disordered proteins , 2020, Nucleic acids research.

[34]  Garrett M. Ginell,et al.  Revealing the Hidden Sensitivity of Intrinsically Disordered Proteins to their Chemical Environment. , 2020, The journal of physical chemistry letters.

[35]  A. Holehouse,et al.  Intrinsically disordered protein regions and phase separation: sequence determinants of assembly or lack thereof. , 2020, Emerging topics in life sciences.

[36]  Joshua A. Riback,et al.  Properties of protein unfolded states suggest broad selection for expanded conformational ensembles , 2020, Proceedings of the National Academy of Sciences.

[37]  C. Gradinaru,et al.  Conformational ensembles of an intrinsically disordered protein consistent with NMR, SAXS and single-molecule FRET. , 2020, Journal of the American Chemical Society.

[38]  Jesse B. Hopkins,et al.  Small-angle X-ray scattering experiments of monodisperse intrinsically disordered protein samples close to the solubility limit. , 2020, Methods in enzymology.

[39]  Bradley A. Rogers,et al.  De novo engineering of intracellular condensates using artificial disordered proteins , 2020, Nature Chemistry.

[40]  K. Lindorff-Larsen,et al.  Order and disorder—An integrative structure of the full-length human growth hormone receptor , 2020, Science Advances.

[41]  Alan M. Moses,et al.  Identifying molecular features that are associated with biological function of intrinsically disordered protein regions , 2020, bioRxiv.

[42]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[43]  R. Pappu,et al.  Generalized models for bond percolation transitions of associative polymers. , 2020, Physical review. E.

[44]  Gene W. Yeo,et al.  How RNA-Binding Proteins Interact with RNA: Molecules and Mechanisms. , 2020, Molecular cell.

[45]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[46]  R. Pappu,et al.  Valence and patterning of aromatic residues determine the phase behavior of prion-like domains , 2020, Science.

[47]  R. Pappu,et al.  Physical Principles Underlying the Complex Biology of Intracellular Phase Transitions. , 2020, Annual review of biophysics.

[48]  Wenwei Zheng,et al.  Hydropathy patterning complements charge patterning to describe conformational preferences of disordered proteins , 2020, bioRxiv.

[49]  Wenwei Zheng,et al.  Polymer effects modulate binding affinities in disordered proteins , 2019, Proceedings of the National Academy of Sciences.

[50]  R. Pappu,et al.  Unfolded states under folding conditions accommodate sequence-specific conformational preferences with random coil-like dimensions , 2019, Proceedings of the National Academy of Sciences.

[51]  Myle Ott,et al.  Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2019, Proceedings of the National Academy of Sciences.

[52]  M. Kjaergaard,et al.  Effective concentrations enforced by intrinsically disordered linkers are governed by polymer physics , 2019, Proceedings of the National Academy of Sciences.

[53]  R. Best,et al.  Disordered RNA Chaperones Enhance Nucleic Acid Folding via Local Charge Screening , 2019, Biophysical Journal.

[54]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[55]  D. Thirumalai,et al.  Synergy between intrinsically disordered domains and structured proteins amplifies membrane curvature sensing , 2018, Nature Communications.

[56]  Wenwei Zheng,et al.  Relation between single-molecule properties and phase behavior of intrinsically disordered proteins , 2018, Proceedings of the National Academy of Sciences.

[57]  P. Wolynes,et al.  AWSEM-IDP: A Coarse-Grained Force Field for Intrinsically Disordered Proteins. , 2018, The journal of physical chemistry. B.

[58]  R. Pappu,et al.  A Molecular Grammar Governing the Driving Forces for Phase Separation of Prion-like RNA Binding Proteins , 2018, Cell.

[59]  Paul Robustelli,et al.  Developing a molecular dynamics force field for both folded and disordered protein states , 2018, Proceedings of the National Academy of Sciences.

[60]  Ashutosh Chilkoti,et al.  Convergence of Artificial Protein Polymers and Intrinsically Disordered Proteins. , 2018, Biochemistry.

[61]  Katrine Bugge,et al.  Extreme disorder in an ultrahigh-affinity protein complex , 2018, Nature.

[62]  Wenwei Zheng,et al.  Sequence determinants of protein phase behavior from a coarse-grained model , 2017, bioRxiv.

[63]  R. Pappu,et al.  Quantitative analysis of multilayer organization of proteins and RNA in nuclear speckles at super resolution , 2017, Journal of Cell Science.

[64]  Joshua A. Riback,et al.  Innovative scattering analysis shows that hydrophobic disordered proteins are expanded in water , 2017, Science.

[65]  R. Pappu,et al.  Control of transcriptional activity by design of charge patterning in the intrinsically disordered RAM region of the Notch receptor , 2017, Proceedings of the National Academy of Sciences.

[66]  Michael K. Rosen,et al.  Intrinsically disordered sequences enable modulation of protein phase separation through distributed tyrosine motifs , 2017, The Journal of Biological Chemistry.

[67]  S. Showalter,et al.  Application of NMR to studies of intrinsically disordered proteins. , 2017, Archives of biochemistry and biophysics.

[68]  D. Shechter,et al.  Fly Fishing for Histones: Catch and Release by Histone Chaperone Intrinsically Disordered Regions and Acidic Stretches. , 2017, Journal of molecular biology.

[69]  Rohit V Pappu,et al.  Intrinsically disordered linkers determine the interplay between phase separation and gelation in multivalent proteins , 2017, bioRxiv.

[70]  S. Showalter,et al.  Phosphorylation induces sequence-specific conformational switches in the RNA polymerase II C-terminal domain , 2017, Nature Communications.

[71]  S. Showalter,et al.  Structural heterogeneity in the intrinsically disordered RNA polymerase II C-terminal domain , 2017, Nature Communications.

[72]  Joshua A. Riback,et al.  Stress-Triggered Phase Separation Is an Adaptive, Evolutionarily Tuned Response , 2017, Cell.

[73]  H. Chan,et al.  Phase Separation and Single-Chain Compactness of Charged Disordered Proteins Are Strongly Correlated. , 2017, Biophysical journal.

[74]  Alan M. Moses,et al.  Selection maintains signaling function of a highly diverged intrinsically disordered region , 2017, Proceedings of the National Academy of Sciences.

[75]  R. Pappu,et al.  Sequence Determinants of the Conformational Properties of an Intrinsically Disordered Protein Prior to and upon Multisite Phosphorylation. , 2016, Journal of the American Chemical Society.

[76]  Wenwei Zheng,et al.  Consistent View of Polypeptide Chain Expansion in Chemical Denaturants from Multiple Experimental Methods. , 2016, Journal of the American Chemical Society.

[77]  A. Grishaev,et al.  Probing the Action of Chemical Denaturant on an Intrinsically Disordered Protein by Simulation and Experiment. , 2016, Journal of the American Chemical Society.

[78]  S. Showalter,et al.  Quantification of Compactness and Local Order in the Ensemble of the Intrinsically Disordered Protein FCP1. , 2016, The journal of physical chemistry. B.

[79]  B. Schuler,et al.  Single-Molecule FRET Spectroscopy and the Polymer Physics of Unfolded and Intrinsically Disordered Proteins. , 2016, Annual review of biophysics.

[80]  Bernard Cabane,et al.  Coarse‐grained modeling of the intrinsically disordered protein Histatin 5 in solution: Monte Carlo simulations in combination with SAXS , 2016, Proteins.

[81]  Timothy D Craggs,et al.  Membraneless organelles can melt nucleic acid duplexes and act as biomolecular filters. , 2016, Nature chemistry.

[82]  Peter Tompa,et al.  Polymer physics of intracellular phase transitions , 2015, Nature Physics.

[83]  Thomas J Lane,et al.  MDTraj: a modern, open library for the analysis of molecular dynamics trajectories , 2014, bioRxiv.

[84]  K. Ghosh,et al.  A theoretical method to compute sequence dependent configurational properties in charged polymers and proteins. , 2015, The Journal of chemical physics.

[85]  R. Murphy,et al.  Asparagine Repeat Peptides: Aggregation Kinetics and Comparison with Glutamine Repeats. , 2015, Biochemistry.

[86]  Rohit V Pappu,et al.  Relating sequence encoded information to form and function of intrinsically disordered proteins. , 2015, Current opinion in structural biology.

[87]  Timothy D. Craggs,et al.  Phase Transition of a Disordered Nuage Protein Generates Environmentally Responsive Membraneless Organelles , 2015, Molecular cell.

[88]  R. Pappu,et al.  Quantitative assessments of the distinct contributions of polypeptide backbone amides versus side chain groups to chain expansion via chemical denaturation. , 2015, Journal of the American Chemical Society.

[89]  Robert T. McGibbon,et al.  MDTraj: a modern, open library for the analysis of molecular dynamics trajectories , 2014, bioRxiv.

[90]  Christopher J. Oldfield,et al.  Classification of Intrinsically Disordered Regions and Proteins , 2014, Chemical reviews.

[91]  B. Schuler,et al.  Single-molecule spectroscopy reveals polymer effects of disordered proteins in crowded environments , 2014, Proceedings of the National Academy of Sciences.

[92]  E. Young,et al.  Binding and Transcriptional Regulation by 14-3-3 (Bmh) Proteins Requires Residues Outside of the Canonical Motif , 2013, Eukaryotic Cell.

[93]  A. Rufiange,et al.  Casein Kinase 2 Associates with the Yeast Chromatin Reassembly Factor Spt2/Sin1 To Regulate Its Function in the Repression of Spurious Transcription , 2013, Molecular and Cellular Biology.

[94]  R. Pappu,et al.  Conformations of intrinsically disordered proteins are influenced by linear sequence distributions of oppositely charged residues , 2013, Proceedings of the National Academy of Sciences.

[95]  Nicholas Lyle,et al.  Describing sequence-ensemble relationships for intrinsically disordered proteins. , 2013, The Biochemical journal.

[96]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[97]  Alessandro Borgia,et al.  Polymer scaling laws of unfolded and intrinsically disordered proteins quantified with single-molecule spectroscopy , 2012, Proceedings of the National Academy of Sciences.

[98]  Jimin Pei,et al.  Cell-free Formation of RNA Granules: Low Complexity Sequence Domains Form Dynamic Fibers within Hydrogels , 2012, Cell.

[99]  Jimin Pei,et al.  Cell-free Formation of RNA Granules: Bound RNAs Identify Features and Components of Cellular Assemblies , 2012, Cell.

[100]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[101]  A. Dunker,et al.  Evolution and disorder. , 2011, Current opinion in structural biology.

[102]  W. Huh,et al.  Regulation of yeast Yak1 kinase by PKA and autophosphorylation‐dependent 14‐3‐3 binding , 2011, Molecular microbiology.

[103]  Dominique Durand,et al.  Proline-rich salivary proteins have extended conformations. , 2010, Biophysical journal.

[104]  L. Reymond,et al.  Charge interactions can dominate the dimensions of intrinsically disordered proteins , 2010, Proceedings of the National Academy of Sciences.

[105]  J. Marsh,et al.  Sequence determinants of compaction in intrinsically disordered proteins. , 2010, Biophysical journal.

[106]  Caitlin L. Chicoine,et al.  Net charge per residue modulates conformational ensembles of intrinsically disordered proteins , 2010, Proceedings of the National Academy of Sciences.

[107]  Markus Blatter,et al.  RNA recognition motifs: boring? Not quite. , 2008, Current opinion in structural biology.

[108]  H. Chan,et al.  Polyelectrostatic interactions of disordered ligands suggest a physical basis for ultrasensitivity , 2007, Proceedings of the National Academy of Sciences.

[109]  S. Lindquist,et al.  A natively unfolded yeast prion monomer adopts an ensemble of collapsed and rapidly fluctuating structures , 2007, Proceedings of the National Academy of Sciences.

[110]  J. Forman-Kay,et al.  Atomic-level characterization of disordered protein ensembles. , 2007, Current opinion in structural biology.

[111]  Ronald Wetzel,et al.  Fluorescence correlation spectroscopy shows that monomeric polyglutamine molecules form collapsed structures in aqueous solutions , 2006, Proceedings of the National Academy of Sciences.

[112]  I. Sola,et al.  Coronavirus nucleocapsid protein is an RNA chaperone , 2006, Virology.

[113]  Kevin P. Byrne,et al.  The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species. , 2005, Genome research.

[114]  Hoang T. Tran,et al.  Reconciling observations of sequence-specific conformational propensities with the generic polymeric behavior of denatured proteins. , 2005, Biochemistry.

[115]  P. Tompa Intrinsically unstructured proteins. , 2002, Trends in biochemical sciences.

[116]  K. Hall,et al.  RNA-protein interactions. , 2002, Current opinion in structural biology.

[117]  H. Dyson,et al.  Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. , 1999, Journal of molecular biology.

[118]  H. Dyson,et al.  Equilibrium NMR studies of unfolded and partially folded proteins , 1998, Nature Structural Biology.

[119]  K. Betts Molecules and Mechanisms , 1995, Environmental Health Perspectives.

[120]  B Tidor,et al.  Arginine-mediated RNA recognition: the arginine fork , 1991, Science.

[121]  Garrett M. Ginell,et al.  An Introduction to the Stickers-and-Spacers Framework as Applied to Biomolecular Condensates. , 2023, Methods in molecular biology.

[122]  G. Daughdrill Disorder for Dummies: Functional Mutagenesis of Transient Helical Segments in Disordered Proteins. , 2020, Methods in molecular biology.

[123]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[124]  D. Svergun,et al.  Analysis of intrinsically disordered proteins by small-angle X-ray scattering. , 2012, Methods in molecular biology.

[125]  T. C. B. McLeish,et al.  Polymer Physics , 2009, Encyclopedia of Complexity and Systems Science.

[126]  R. Pappu,et al.  A polymer physics perspective on driving forces and mechanisms for protein aggregation. , 2008, Archives of biochemistry and biophysics.

[127]  Christopher J. Oldfield,et al.  Intrinsically disordered protein. , 2001, Journal of molecular graphics & modelling.

[128]  J. Szulmajster Protein folding , 1988, Bioscience reports.