ODiNPred: comprehensive prediction of protein order and disorder

Structural disorder is widespread in eukaryotic proteins and is vital for their function in diverse biological processes. It is therefore highly desirable to be able to predict the degree of order and disorder from amino acid sequence. It is, however, notoriously difficult to predict the degree of local flexibility within structured domains and the presence and nuances of localized rigidity within intrinsically disordered regions. To identify such instances, we used the CheZOD database, which encompasses accurate, balanced, and continuous-valued quantification of protein (dis)order at amino acid resolution based on NMR chemical shifts. To computationally forecast the spectrum of protein disorder in the most comprehensive manner possible, we constructed the sequence-based protein order/disorder predictor ODiNPred, trained on an expanded version of CheZOD. ODiNPred applies a deep neural network comprising 157 unique sequence features to 1325 protein sequences together with the experimental NMR chemical shift data. Cross-validation for 117 protein sequences shows that ODiNPred better predicts the continuous variation in order along the protein sequence, suggesting that contemporary predictors are limited by the quality of training data. The inclusion of evolutionary features reduces the performance gap between ODiNPred and its peers, but analysis shows that it retains greater accuracy for the more challenging prediction of intermediate disorder.

[1]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[2]  Hayato Yamana,et al.  MFSPSSMpred: identifying short disorder-to-order binding regions in disordered proteins based on contextual local evolutionary conservation , 2013, BMC Bioinformatics.

[3]  David T. Jones,et al.  DISOPRED3: precise disordered region predictions with annotated protein-binding activity , 2014, Bioinform..

[4]  F. Mulder,et al.  POTENCI: prediction of temperature, neighbor and pH-corrected chemical shifts for intrinsically disordered proteins , 2018, Journal of Biomolecular NMR.

[5]  L. Kay,et al.  Latent and active p53 are identical in conformation , 2001, Nature Structural Biology.

[6]  Lukasz Kurgan,et al.  Comprehensive review of methods for prediction of intrinsic disorder and its molecular functions , 2017, Cellular and Molecular Life Sciences.

[7]  Niall J. Haslam,et al.  Understanding eukaryotic linear motifs and their role in cell signaling and regulation. , 2008, Frontiers in bioscience : a journal and virtual library.

[8]  Stanley B. Prusiner,et al.  Nobel Lecture: Prions , 1998 .

[9]  A. Azzalini The Skew‐normal Distribution and Related Multivariate Families * , 2005 .

[10]  Jose Javier Gonzalez Ortiz,et al.  What is the State of Neural Network Pruning? , 2020, MLSys.

[11]  Marc S. Cortese,et al.  Coupled folding and binding with α-helix-forming molecular recognition elements , 2005 .

[12]  David S Wishart,et al.  A simple method to predict protein flexibility using secondary chemical shifts. , 2005, Journal of the American Chemical Society.

[13]  A. Levine,et al.  Structure of the MDM2 Oncoprotein Bound to the p53 Tumor Suppressor Transactivation Domain , 1996, Science.

[14]  David Eisenberg,et al.  The helical hydrophobic moment: a measure of the amphiphilicity of a helix , 1982, Nature.

[15]  Sonia Longhi,et al.  Simultaneous quantification of protein order and disorder. , 2017, Nature chemical biology.

[16]  Nathalie Sibille,et al.  Structural characterization of intrinsically disordered proteins by the combined use of NMR and SAXS. , 2012, Biochemical Society transactions.

[17]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[18]  P. Y. Chou,et al.  Empirical predictions of protein conformation. , 1978, Annual review of biochemistry.

[19]  Vladimir N. Uversky,et al.  Conditionally and transiently disordered proteins: awakening cryptic disorder to regulate protein function. , 2014, Chemical reviews.

[20]  A Keith Dunker,et al.  Molecular recognition features (MoRFs) in three domains of life. , 2016, Molecular bioSystems.

[21]  Nicolas L. Fawzi,et al.  ALS Mutations Disrupt Phase Separation Mediated by α-Helical Structure in the TDP-43 Low-Complexity C-Terminal Domain. , 2016, Structure.

[22]  Dmitri I Svergun,et al.  Application of SAXS for the Structural Characterization of IDPs. , 2015, Advances in experimental medicine and biology.

[23]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[24]  M Nakata,et al.  NMR studies of model peptides of PHGGGWGQ repeats within the N-terminus of prion proteins: a loop conformation with histidine and tryptophan in close proximity. , 2000, Journal of biochemistry.

[25]  E. Hellinger,et al.  Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen. , 1909 .

[26]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[27]  Miron Livny,et al.  BioMagResBank , 2007, Nucleic Acids Res..

[28]  Christopher J. Oldfield,et al.  Intrinsically disordered protein. , 2001, Journal of molecular graphics & modelling.

[29]  Vladimir N. Uversky,et al.  p53 Proteoforms and Intrinsic Disorder: An Illustration of the Protein Structure–Function Continuum Concept , 2016, International journal of molecular sciences.

[30]  Yaoqi Zhou,et al.  Improving protein disorder prediction by deep bidirectional long short‐term memory recurrent neural networks , 2016, Bioinform..

[31]  J. Marsh,et al.  Sensitivity of secondary structure propensities to sequence differences between α‐ and γ‐synuclein: Implications for fibrillation , 2006 .

[32]  A. Hermann,et al.  Prion-like properties of disease-relevant proteins in amyotrophic lateral sclerosis , 2018, Journal of Neural Transmission.

[33]  J. Mornon,et al.  Hydrophobic cluster analysis: An efficient new way to compare and analyse amino acid sequences , 1987, FEBS letters.

[34]  Ralph Zahn,et al.  The octapeptide repeats in mammalian prion protein constitute a pH-dependent folding and aggregation site. , 2003, Journal of molecular biology.

[35]  H. Dyson,et al.  Intrinsically unstructured proteins and their functions , 2005, Nature Reviews Molecular Cell Biology.

[36]  A Keith Dunker,et al.  TOP-IDP-scale: a new amino acid scale measuring propensity for intrinsic disorder. , 2008, Protein and peptide letters.

[37]  F. Mulder,et al.  There is Diversity in Disorder—“In all Chaos there is a Cosmos, in all Disorder a Secret Order” , 2016, Front. Mol. Biosci..

[38]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..

[39]  A. Aguzzi,et al.  A suspicious signature , 1996, Nature.

[40]  K Wüthrich,et al.  NMR solution structure of the human prion protein. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[41]  V. Uversky Intrinsically Disordered Proteins , 2014 .

[42]  Carlo Camilloni,et al.  Determination of secondary structure populations in disordered states of proteins using nuclear magnetic resonance chemical shifts. , 2012, Biochemistry.

[43]  J. Marsh,et al.  Sequence determinants of compaction in intrinsically disordered proteins. , 2010, Biophysical journal.

[44]  D Harrich,et al.  Cloning and characterization of a novel cellular protein, TDP-43, that binds to human immunodeficiency virus type 1 TAR DNA sequence motifs , 1995, Journal of virology.

[45]  Zsuzsanna Dosztányi,et al.  Prediction of Protein Binding Regions in Disordered Proteins , 2009, PLoS Comput. Biol..

[46]  Christopher C. J. Miller,et al.  Disruption of ER−mitochondria signalling in fronto-temporal dementia and related amyotrophic lateral sclerosis , 2018, Cell Death & Disease.

[47]  Sanmy R. Nóbrega,et al.  Is Resistance Training to Muscular Failure Necessary? , 2016, Front. Physiol..

[48]  Tai-Huang Huang,et al.  Molecular mechanism of oxidation‐induced TDP‐43 RRM1 aggregation and loss of function , 2013, FEBS Letters.

[49]  Kresten Lindorff-Larsen,et al.  Editorial overview: Theory and simulation: Interpreting experimental data at the molecular level. , 2018, Current opinion in structural biology.

[50]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[51]  Fabian Sievers,et al.  Clustal Omega for making accurate alignments of many protein sequences , 2018, Protein science : a publication of the Protein Society.

[52]  Lukasz Kurgan,et al.  MFDp2: Accurate predictor of disorder in proteins by fusion of disorder probabilities, content and profiles. , 2013, Intrinsically disordered proteins.

[53]  Zsuzsanna Dosztányi,et al.  ANCHOR: web server for predicting protein binding regions in disordered proteins , 2009, Bioinform..

[54]  Terrence L. Fine,et al.  Feedforward Neural Network Methodology , 1999, Information Science and Statistics.

[55]  Vladimir N. Uversky,et al.  Intrinsically Disordered Proteins and Their “Mysterious” (Meta)Physics , 2019, Front. Phys..

[56]  Zsuzsanna Dosztányi,et al.  Analyzing Protein Disorder with IUPred2A , 2020, Current protocols in bioinformatics.

[57]  Robert B. Russell,et al.  GlobPlot: exploring protein sequences for globularity and disorder , 2003, Nucleic Acids Res..

[58]  M. Maynadier,et al.  Multifunctional Gold-Mesoporous Silica Nanocomposites for Enhanced Two-Photon Imaging and Therapy of Cancer Cells , 2016, Front. Mol. Biosci..

[59]  Adam Godzik,et al.  Between order and disorder in protein structures: analysis of "dual personality" fragments in proteins. , 2007, Structure.

[60]  L. Serrano,et al.  A comparative study of the relationship between protein structure and beta-aggregation in globular and intrinsically disordered proteins. , 2004, Journal of molecular biology.

[61]  F. Poulsen,et al.  Disordered proteins studied by chemical shifts. , 2012, Progress in nuclear magnetic resonance spectroscopy.

[62]  Guilhem Faure,et al.  Comprehensive Repertoire of Foldable Regions within Whole Genomes , 2013, PLoS Comput. Biol..

[63]  Tristan Bitard-Feildel,et al.  HCAtk and pyHCA: A Toolkit and Python API for the Hydrophobic Cluster Analysis of Protein Sequences , 2018, medRxiv.

[64]  C. Chou,et al.  The physical forces mediating self-association and phase-separation in the C-terminal domain of TDP-43. , 2018, Biochimica et biophysica acta. Proteins and proteomics.

[65]  J. Gready,et al.  Conformation of prion protein repeat peptides probed by FRET measurements and molecular dynamics simulations. , 2004, Biophysical journal.

[66]  M. Vihinen,et al.  Accuracy of protein flexibility predictions , 1994, Proteins.

[67]  A. Holehouse IDPs and IDRs in biomolecular condensates , 2019, Intrinsically Disordered Proteins.

[68]  D. Laurents,et al.  Point mutations in the N-terminal domain of transactive response DNA-binding protein 43 kDa (TDP-43) compromise its stability, dimerization, and functions , 2017, The Journal of Biological Chemistry.

[69]  A Bairoch,et al.  The SWISS-PROT protein sequence database: its relevance to human molecular medical research. , 1997, Journal of molecular medicine.

[70]  Amit Ghosh,et al.  Dynamic Studies on Intrinsically Disordered Regions of Two Paralogous Transcription Factors Reveal Rigid Segments with Important Biological Functions. , 2019, Journal of molecular biology.

[71]  Guillaume Lamour,et al.  Promiscuity as a functional trait: intrinsically disordered regions as central players of interactomes. , 2013, The Biochemical journal.

[72]  Leszek Rychlewski,et al.  ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins , 2003, Nucleic Acids Res..

[73]  Hao He,et al.  Computational prediction of MoRFs based on protein sequences and minimax probability machine , 2019, BMC Bioinformatics.

[74]  William McGinnis,et al.  Homeobox genes and axial patterning , 1992, Cell.

[75]  Wei Li,et al.  RaptorX-Property: a web server for protein structure property prediction , 2016, Nucleic Acids Res..

[76]  A. Fersht,et al.  Structural biology of the tumor suppressor p53. , 2008, Annual review of biochemistry.

[77]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[78]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[79]  M. Blackledge,et al.  Conformational propensities of intrinsically disordered proteins from NMR chemical shifts. , 2013, Chemphyschem : a European journal of chemical physics and physical chemistry.

[80]  X. Roucou,et al.  The prion protein unstructured N‐terminal region is a broad‐spectrum molecular sensor with diverse and contrasting potential functions , 2011, Journal of neurochemistry.

[81]  Zsuzsanna Dosztányi,et al.  IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding , 2018, Nucleic Acids Res..

[82]  Silvio C. E. Tosatto,et al.  MOBI: a web server to define and visualize structural mobility in NMR protein ensembles , 2010, Bioinform..

[83]  C. Arrowsmith,et al.  Single-stranded DNA mimicry in the p53 transactivation domain interaction with replication protein A , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[84]  I. Muchnik,et al.  Recognition of a protein fold in the context of the SCOP classification , 1999 .

[85]  David S Wishart,et al.  RefDB: A database of uniformly referenced protein chemical shifts , 2003, Journal of biomolecular NMR.

[86]  V. Uversky,et al.  Functional roles of transiently and intrinsically disordered regions within proteins , 2015, The FEBS journal.

[87]  A. Srivastava,et al.  Copper Alters Aggregation Behavior of Prion Protein and Induces Novel Interactions between Its N- and C-terminal Regions , 2011, The Journal of Biological Chemistry.

[88]  S. Teichmann,et al.  Probing the diverse landscape of protein flexibility and binding. , 2012, Current opinion in structural biology.

[89]  Pieter S. van der Meulen,et al.  pepKalc: scalable and comprehensive calculation of electrostatic interactions in random coil polypeptides , 2018, Bioinform..

[90]  I. Bahar,et al.  Structural changes involved in protein binding correlate with intrinsic motions of proteins in the unbound state. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[91]  B. Caughey,et al.  Structure of the flexible amino-terminal domain of prion protein bound to a sulfated glycan. , 2010, Journal of molecular biology.

[92]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[93]  D. Svergun,et al.  Analysis of intrinsically disordered proteins by small-angle X-ray scattering. , 2012, Methods in molecular biology.

[94]  Silvio C. E. Tosatto,et al.  FELLS: fast estimator of latent local structure , 2017, Bioinform..

[95]  Peter Tompa,et al.  Structural Characterization of Intrinsically Disordered Proteins by NMR Spectroscopy , 2013, Molecules.

[96]  A Keith Dunker,et al.  Unfoldomics of human genetic diseases: illustrative examples of ordered and intrinsically disordered members of the human diseasome. , 2009, Protein and peptide letters.

[97]  E. Walter,et al.  Octarepeat region flexibility impacts prion function, endoproteolysis and disease manifestation , 2015, EMBO molecular medicine.

[98]  S. Fukuchi,et al.  Functional Segments on Intrinsically Disordered Regions in Disease-Related Proteins , 2019, Biomolecules.

[99]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[100]  I. Muchnik,et al.  Recognition of a protein fold in the context of the Structural Classification of Proteins (SCOP) classification. , 1999, Proteins.

[101]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[102]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[103]  Matthew A. Watson,et al.  HMGB1-facilitated p53 DNA binding occurs via HMG-Box/p53 transactivation domain interaction, regulated by the acidic tail. , 2012, Structure.

[104]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[105]  Adam P. Arkin,et al.  FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix , 2009, Molecular biology and evolution.

[106]  H D Dakin,et al.  On Amino-acids. , 1918, The Biochemical journal.

[107]  Anna Tramontano,et al.  Evaluation of disorder predictions in CASP9 , 2011, Proteins.

[108]  W. Surewicz,et al.  The role of liquid–liquid phase separation in aggregation of the TDP-43 low-complexity domain , 2019, The Journal of Biological Chemistry.

[109]  Thomas Lengauer,et al.  Permutation importance: a corrected feature importance measure , 2010, Bioinform..

[110]  A. Aguzzi,et al.  Spongiform encephalopathies: a suspicious signature. , 1996, Nature.

[111]  Liam J. McGuffin,et al.  Disorder Prediction Methods, Their Applicability to Different Protein Targets and Their Usefulness for Guiding Experimental Studies , 2015, International journal of molecular sciences.

[112]  John Q. Trojanowski,et al.  TAR DNA-binding protein 43 in neurodegenerative disease , 2010, Nature Reviews Neurology.

[113]  R. Konrat,et al.  The ambivalent role of proline residues in an intrinsically disordered protein: from disorder promoters to compaction facilitators. , 2019, Journal of molecular biology.

[114]  David J. Weber,et al.  Structure of the negative regulatory domain of p53 bound to S100B(ββ) , 2000, Nature Structural Biology.

[115]  L. Serrano,et al.  Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins , 2004, Nature Biotechnology.

[116]  Silvio C. E. Tosatto,et al.  MobiDB‐lite: fast and highly specific consensus prediction of intrinsic disorder in proteins , 2017, Bioinform..

[117]  F. Mulder,et al.  Using NMR chemical shifts to calculate the propensity for structural order and disorder in proteins. , 2012, Biochemical Society transactions.

[118]  Silvio C. E. Tosatto,et al.  Mobi 2.0: an improved method to define intrinsic disorder, mobility and linear binding regions in protein structures , 2018, Bioinform..

[119]  Jakob T Nielsen,et al.  Quality and bias of protein disorder predictors , 2019, Scientific Reports.

[120]  Lukasz A. Kurgan,et al.  MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins , 2012, Bioinform..

[121]  Anna Tramontano,et al.  Assessment of protein disorder region predictions in CASP10 , 2014, Proteins.

[122]  Motonori Ota,et al.  IDEAL in 2014 illustrates interaction networks composed of intrinsically disordered proteins and their binding partners , 2013, Nucleic Acids Res..

[123]  D. Meek Regulation of the p53 response and its relationship to cancer. , 2015, The Biochemical journal.

[124]  L. Serrano,et al.  Protein aggregation and amyloidosis: confusion of the kinds? , 2006, Current opinion in structural biology.

[125]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[126]  P. Tompa,et al.  The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. , 2005, Journal of molecular biology.