Prediction of unfolded segments in a protein sequence based on amino acid composition

MOTIVATION Partially and wholly unstructured proteins have now been identified in all kingdoms of life--more commonly in eukaryotic organisms. This intrinsic disorder is related to certain critical functions. Apart from their fundamental interest, unstructured regions in proteins may prevent crystallization. Therefore, the prediction of disordered regions is an important aspect for the understanding of protein function, but may also help to devise genetic constructs. RESULTS In this paper we present a computational tool for the detection of unstructured regions in proteins based on two properties of unfolded fragments: (1) disordered regions have a biased composition and (2) they usually contain either small or no hydrophobic clusters. In order to quantify these two facts we first calculate the amino acid distributions in structured and unstructured regions. Using this distribution, we calculate for a given sequence fragment the probability to be part of either a structured or an unstructured region. For each amino acid, the distance to the nearest hydrophobic cluster is also computed. Using these three values along a protein sequence allows us to predict unstructured regions, with very simple rules. This method requires only the primary sequence, and no multiple alignment, which makes it an adequate method for orphan proteins. AVAILABILITY http://genomics.eu.org/

[1]  Christopher J. Oldfield,et al.  Intrinsically disordered protein. , 2001, Journal of molecular graphics & modelling.

[2]  B. Rost,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round 6 , 2005, Proteins.

[3]  J. Mornon,et al.  The BAH (bromo‐adjacent homology) domain: a link between DNA methylation, replication and transcriptional regulation , 1999, FEBS letters.

[4]  Robert B. Russell,et al.  GlobPlot: exploring protein sequences for globularity and disorder , 2003, Nucleic Acids Res..

[5]  A.K. Dunker,et al.  Identifying disordered regions in proteins from amino acid sequence , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[6]  V. Uversky Intrinsically Disordered Proteins , 2000 .

[7]  Kay Hofmann,et al.  Tmbase-A database of membrane spanning protein segments , 1993 .

[8]  A Keith Dunker,et al.  Intrinsic disorder and protein function. , 2002, Biochemistry.

[9]  P. Tompa Intrinsically unstructured proteins. , 2002, Trends in biochemical sciences.

[10]  Markus Wistrand,et al.  Improving profile HMM discrimination by adapting transition probabilities. , 2004, Journal of molecular biology.

[11]  David T. Jones,et al.  Prediction of disordered regions in proteins from position specific score matrices , 2003, Proteins.

[12]  K. Plaxco,et al.  Unfolded, yes, but random? Never! , 2001, Nature Structural Biology.

[13]  P. Romero,et al.  Sequence complexity of disordered protein , 2001, Proteins.

[14]  J. Hoh,et al.  Reduced amino acid alphabet is sufficient to accurately recognize intrinsically disordered protein , 2004, FEBS letters.

[15]  R. Nussinov,et al.  Extended disordered proteins: targeting function with less scaffold. , 2003, Trends in biochemical sciences.

[16]  M J Sternberg,et al.  Identification of sequence motifs from a set of proteins with related function. , 1994, Protein engineering.

[17]  T. Richmond,et al.  Solvent mediated interactions in the structure of the nucleosome core particle at 1.9 a resolution. , 2002, Journal of molecular biology.

[18]  G. Bates,et al.  Huntingtin aggregation and toxicity in Huntington's disease , 2003, The Lancet.

[19]  B. Rost,et al.  Loopy proteins appear conserved in evolution. , 2002, Journal of molecular biology.

[20]  Peer Bork,et al.  SMART 4.0: towards genomic data integration , 2004, Nucleic Acids Res..

[21]  T. Gibson,et al.  Protein disorder prediction: implications for structural proteomics. , 2003, Structure.

[22]  Stephen H. Bryant,et al.  CD-Search: protein domain annotations on the fly , 2004, Nucleic Acids Res..

[23]  C. Brown,et al.  Intrinsic protein disorder in complete genomes. , 2000, Genome informatics. Workshop on Genome Informatics.

[24]  Djamal Bouzida,et al.  Simulating disorder–order transitions in molecular recognition of unstructured proteins: Where folding meets binding , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  M F Lawrence,et al.  Impedance-based detection of DNA sequences using a silicon transducer with PNA as the probe layer. , 2004, Nucleic acids research.

[26]  B. Wang,et al.  The nucleosomal core histone octamer at 3.1 A resolution: a tripartite protein assembly and a left-handed superhelix. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[27]  E. Mandelkow,et al.  Structural studies of tau protein and Alzheimer paired helical filaments show no evidence for beta-structure. , 1994, The Journal of biological chemistry.

[28]  Jérôme Gouzy,et al.  ProDom: Automated Clustering of Homologous Domains , 2002, Briefings Bioinform..

[29]  Chris Sander,et al.  CAST: an iterative algorithm for the complexity analysis of sequence tracts , 2000, Bioinform..

[30]  B. Rost,et al.  Alignments grow, secondary structure prediction improves , 2002, Proteins.

[31]  Amos Bairoch,et al.  Recent improvements to the PROSITE database , 2004, Nucleic Acids Res..

[32]  John Moult,et al.  Evaluation of disorder predictions in CASP5 , 2003, Proteins.

[33]  G. Labesse,et al.  Deciphering protein sequence information through hydrophobic cluster analysis (HCA): current status and perspectives , 1997, Cellular and Molecular Life Sciences CMLS.

[34]  Burkhard Rost,et al.  The PredictProtein server , 2003, Nucleic Acids Res..

[35]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[36]  Zoran Obradovic,et al.  Predicting intrinsic disorder from amino acid sequence , 2003, Proteins.

[37]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[38]  Elisha Haas,et al.  Alpha-synuclein: its biological function and role in neurodegenerative diseases. , 2003, Journal of molecular neuroscience : MN.

[39]  V. Uversky,et al.  The chicken–egg scenario of protein folding revisited , 2002, FEBS letters.

[40]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[41]  Obradovic,et al.  Predicting Protein Disorder for N-, C-, and Internal Regions. , 1999, Genome informatics. Workshop on Genome Informatics.

[42]  M. Vasák,et al.  Solution structure of native proteins with irregular folds from Raman optical activity. , 2001, Biopolymers.

[43]  Thomas L. Madden,et al.  BLAST: at the core of a powerful and diverse set of sequence analysis tools , 2004, Nucleic Acids Res..

[44]  V. Uversky,et al.  Why are “natively unfolded” proteins unstructured under physiologic conditions? , 2000, Proteins.

[45]  Peter E Wright,et al.  Solution Structure of the KIX Domain of CBP Bound to the Transactivation Domain of CREB: A Model for Activator:Coactivator Interactions , 1997, Cell.

[46]  John C. Wootton,et al.  Non-globular Domains in Protein Sequences: Automated Segmentation Using Complexity Measures , 1994, Comput. Chem..

[47]  P. Evans,et al.  Endocytosis and vesicle trafficking. , 2002, Current opinion in structural biology.

[48]  Burkhard Rost,et al.  NORSp: predictions of long regions without regular secondary structure , 2003, Nucleic Acids Res..

[49]  L. Iakoucheva,et al.  Intrinsic Disorder and Protein Function , 2002 .