β Edge strands in protein structure prediction and aggregation

It is well established that recognition between exposed edges of β‐sheets is an important mode of protein–protein interaction and can have pathological consequences; for instance, it has been linked to the aggregation of proteins into a fibrillar structure, which is associated with a number of predominantly neurodegenerative disorders. A number of protective mechanisms have evolved in the edge strands of β‐sheets, preventing the aggregation and insolubility of most natural β‐sheet proteins. Such mechanisms are unfavorable in the interior of a β‐sheet. The problem of distinguishing edge strands from central strands based on sequence information alone is important in predicting residues and mutations likely to be involved in aggregation, and is also a first step in predicting folding topology. Here we report support vector machine (SVM) and decision tree methods developed to classify edge strands from central strands in a representative set of protein domains. Interestingly, rules generated by the decision tree method are in close agreement with our knowledge of protein structure and are potentially useful in a number of different biological applications. When trained on strands from proteins of known structure, using structure‐based (Dictionary of Secondary Structure in Proteins) strand assignments, both methods achieved mean cross‐validated, prediction accuracies of ∼78%. These accuracies were reduced when strand assignments from secondary structure prediction were used. Further investigation of this effect revealed that it could be explained by a significant reduction in the accuracy of standard secondary structure prediction methods for edge strands, in comparison with central strands.

[1]  W. Wooster,et al.  Crystal structure of , 2005 .

[2]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[3]  M J Sternberg,et al.  On the conformation of proteins: hydrophobic ordering of strands in beta-pleated sheets. , 1977, Journal of molecular biology.

[4]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[5]  D. Eisenberg,et al.  The hydrophobic moment detects periodicity in protein hydrophobicity. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[6]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[7]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[8]  D. Eisenberg,et al.  Crystal structure of defensin HNP-3, an amphiphilic dimer: mechanisms of membrane permeabilization. , 1991, Science.

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  J. Thornton,et al.  Identification, classification, and analysis of beta‐bulges in proteins , 1993, Protein science : a publication of the Protein Society.

[11]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[12]  P. S. Kim,et al.  Context is a major determinant of β-sheet propensity , 1994, Nature.

[13]  R. King,et al.  On the use of machine learning to identify topological rules in the packing of β-strands , 1994 .

[14]  R A Sayle,et al.  RASMOL: biomolecular graphics for all. , 1995, Trends in biochemical sciences.

[15]  J. Kelly,et al.  Progress towards understanding β-sheet structure , 1996 .

[16]  L. Serpell,et al.  The "edge strand" hypothesis: Prediction and test of a mutational "hot-spot" on the transthyretin molecule associated with FAP amyloidogenesis , 1996 .

[17]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[18]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  D. Baker,et al.  Prediction of local structure in proteins using a library of sequence-structure motifs. , 1998, Journal of molecular biology.

[21]  J. Thornton,et al.  Determinants of strand register in antiparallel β‐sheets of proteins , 1998, Protein science : a publication of the Protein Society.

[22]  J. Thornton,et al.  PQS: a protein quaternary structure file server. , 1998, Trends in biochemical sciences.

[23]  D Gorse,et al.  Prediction of the location and type of β‐turns in proteins using neural networks , 1999, Protein science : a publication of the Protein Society.

[24]  L. Gregoret,et al.  Context-dependence of Amino Acid Residue Pairing in Antiparallel β-She?ets , 1999 .

[25]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[26]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[27]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[28]  D. Haussler,et al.  Knowledge-based analysis of microarray gene expression , 2000 .

[29]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[31]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[32]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[33]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[34]  T. Yeates,et al.  Identification of a subunit interface in transthyretin amyloid fibrils: evidence for self-assembly from oligomeric building blocks. , 2001, Biochemistry.

[35]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[36]  J. Thornton,et al.  Prediction of strand pairing in antiparallel and parallel β‐sheets using information theory , 2002, Proteins.

[37]  Kevin Burrage,et al.  Prediction of protein solvent accessibility using support vector machines , 2002, Proteins.

[38]  T. Yeates,et al.  Arrangement of subunits and ordering of β-strands in an amyloid sheet , 2002, Nature Structural Biology.

[39]  M. Monti,et al.  Topological investigation of amyloid fibrils obtained from β2‐microglobulin , 2002, Protein science : a publication of the Protein Society.

[40]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[41]  M. Hoshino,et al.  Mapping the core of the β2-microglobulin amyloid fibril by H/D exchange , 2002, Nature Structural Biology.

[42]  Andreas Hoenger,et al.  De novo designed peptide-based amyloid fibrils , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[43]  B. Rost,et al.  Alignments grow, secondary structure prediction improves , 2002, Proteins.

[44]  Jaques Reifman,et al.  Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions , 2002, Bioinform..

[45]  S. Radford,et al.  Crystal structure of monomeric human β-2-microglobulin reveals clues to its amyloidogenic properties , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[46]  M. Hecht,et al.  Rationally designed mutations convert de novo amyloid-like fibrils into monomeric β-sheet proteins , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[47]  J. Richardson,et al.  Natural β-sheet proteins use negative design to avoid edge-to-edge aggregation , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Judith Klein-Seetharaman,et al.  A Novel Method of Protein Secondary Structure Prediction Using Context Sensitive Vocabulary , 2003 .