Precise Generation of Conformational Ensembles for Intrinsically Disordered Proteins via Fine-tuned Diffusion Models

Intrinsically disordered proteins (IDPs) play pivotal roles in various biological functions and are closely linked to many human diseases including cancer. Structural investigations of IDPs typically involve a combination of molecular dynamics (MD) simulations and experimental data to correct for intrinsic biases in simulation methods. However, these simulations are hindered by their high computational cost and a scarcity of experimental data, severely limiting their applicability. Despite the recent advancements in structure prediction for structured proteins, understanding the conformational properties of IDPs remains challenging partly due to the poor conservation of disordered protein sequences and limited experimental characterization. Here, we introduced IDPFold, a method capable of predicting IDP conformation ensembles directly from their sequences using fine-tuned diffusion models. IDPFold bypasses the need for Multiple Sequence Alignments (MSA) or experimental data, achieving accurate predictions of ensemble properties across numerous IDPs. By sampling conformations at the backbone level, IDPFold provides more detailed structural features and more precise property estimation compared to the state-of-the-art methods, and will help to reveal the disorder-function paradigm of IDPs.

[1]  K. Lindorff-Larsen,et al.  Conformational ensembles of the human intrinsically disordered proteome. , 2024, Nature.

[2]  Haifeng Chen,et al.  Balanced Three-Point Water Model OPC3-B for Intrinsically Disordered and Ordered Proteins. , 2023, Journal of chemical theory and computation.

[3]  D. Ashlock,et al.  The difficulty of aligning intrinsically disordered protein sequences as assessed by conservation and phylogeny , 2023, PloS one.

[4]  Brian L. Trippe,et al.  De novo design of protein structure and function with RFdiffusion , 2023, Nature.

[5]  Chang Liu,et al.  Towards Predicting Equilibrium Distributions for Molecular Systems with Deep Learning , 2023, ArXiv.

[6]  Zeming Lin,et al.  Evolutionary-scale prediction of atomic level protein structure with a language model , 2022, bioRxiv.

[7]  John Ingraham,et al.  Illuminating protein space with a programmable generative model , 2022, bioRxiv.

[8]  Haifeng Chen,et al.  Balanced Force Field ff03CMAP Improving the Dynamics Conformation Sampling of Phosphorylation Site , 2022, International journal of molecular sciences.

[9]  Haifeng Chen,et al.  Base-specific RNA force field improving the dynamics conformation of nucleotide. , 2022, International journal of biological macromolecules.

[10]  Junxi Mu,et al.  Balanced Solvent Model for Intrinsically Disordered and Ordered Proteins , 2021, J. Chem. Inf. Model..

[11]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[12]  Silvio C. E. Tosatto,et al.  PED in 2021: a major update of the protein ensemble database for intrinsically disordered proteins , 2020, Nucleic acids research.

[13]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[14]  Peter A Kahn,et al.  Time-resolved cryoEM using Spotiton , 2020, Nature Methods.

[15]  Ray Luo,et al.  Environment-Specific Force Field for Intrinsically Disordered and Ordered Proteins , 2020, J. Chem. Inf. Model..

[16]  Jianyi Yang,et al.  Improved protein structure prediction using predicted interresidue orientations , 2019, Proceedings of the National Academy of Sciences.

[17]  R. Luo,et al.  A Well-Balanced Force Field ff03CMAP for Folded and Disordered Proteins. , 2019, Journal of chemical theory and computation.

[18]  F. Noé,et al.  Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning , 2019, Science.

[19]  Kay Hamacher,et al.  Biotite: a unifying open source computational biology framework in Python , 2018, BMC Bioinformatics.

[20]  Paul Robustelli,et al.  Developing a molecular dynamics force field for both folded and disordered protein states , 2018, Proceedings of the National Academy of Sciences.

[21]  Ray Luo,et al.  ff14IDPs force field improving the conformation sampling of intrinsically disordered proteins , 2017, Chemical biology & drug design.

[22]  David St. John GENeration , 2015, Aristotle on Substance.

[23]  C. Simmerling,et al.  ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. , 2015, Journal of chemical theory and computation.

[24]  Thomas J Lane,et al.  MDTraj: a modern, open library for the analysis of molecular dynamics trajectories , 2014, bioRxiv.

[25]  M. Blackledge,et al.  Exploring free-energy landscapes of intrinsically disordered proteins at atomic resolution using NMR spectroscopy. , 2014, Chemical reviews.

[26]  Mariusz Jaremko,et al.  Predictive atomic resolution descriptions of intrinsically disordered hTau40 and α-synuclein in solution from NMR and small angle scattering. , 2014, Structure.

[27]  M. Hennig,et al.  The arginine-rich RNA-binding motif of HIV-1 Rev is intrinsically disordered and folds upon RRE binding. , 2013, Biophysical journal.

[28]  Pradeep Kota,et al.  Automated minimization of steric clashes in protein structures , 2011, Proteins.

[29]  A. Bax,et al.  SPARTA+: a modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network , 2010, Journal of biomolecular NMR.

[30]  A. McDermott Structure and dynamics of membrane proteins by magic angle spinning solid-state NMR. , 2009, Annual review of biophysics.

[31]  Janel O. Johnson,et al.  α-Synuclein Locus Triplication Causes Parkinson's Disease , 2003, Science.

[32]  Ariel Fernández,et al.  Insufficiently dehydrated hydrogen bonds as determinants of protein interactions , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[33]  L. Iakoucheva,et al.  Intrinsic disorder in cell-signaling and cancer-associated proteins. , 2002, Journal of molecular biology.

[34]  H. Berman,et al.  Electronic Reprint Biological Crystallography the Protein Data Bank Biological Crystallography the Protein Data Bank , 2022 .

[35]  D. Notterman,et al.  Analysis of p53-regulated gene expression patterns using oligonucleotide arrays. , 2000, Genes & development.

[36]  B. Anderson Reverse-time diffusion equation models , 1982 .

[37]  Intrinsically Disordered Proteins , 2013, Cell.

[38]  G. Daughdrill,et al.  Understanding the structural ensembles of a highly extended disordered protein. , 2012, Molecular bioSystems.

[39]  Z. Otwinowski,et al.  X-ray diffraction experiment--the last experiment in the structure elucidation process. , 2009, Advances in protein chemistry and structural biology.

[40]  A Keith Dunker,et al.  Intrinsic disorder and protein function. , 2002, Biochemistry.