Bayesian Models and Algorithms for Protein Beta-Sheet Prediction

Prediction of the three-dimensional structure greatly benefits from the information related to secondary structure, solvent accessibility, and non-local contacts that stabilize a protein’s structure. We address the problem of β-sheet prediction defined as the prediction of β-strand pairings, interaction types (parallel or anti-parallel), and β-residue interactions (or contact maps). We introduce a Bayesian approach for proteins with six or less β-strands, in which we model the conformational features in a probabilistic framework by combining the amino acid pairing potentials with a priori knowledge of β-strand organizations. To select the optimum β-sheet architecture, we significantly reduce the search space by heuristics that enforce the amino acid pairs with strong interaction potentials. In addition, we find the optimum pairwise alignment between β-strands using dynamic programming, in which we allow any number of gaps in an alignment to model β-bulges more effectively. For proteins with more than six β-strands, we first compute β-strand pairings using the BetaPro method. Then, we compute gapped alignments of the paired β-strands and choose the interaction types and βresidue pairings with maximum alignment scores. We performed a 10-fold cross validation experiment on the BetaSheet916 set and obtained significant improvements in the prediction accuracy.

[1]  Burkhard Rost,et al.  PROFcon: novel prediction of long-range contacts , 2005, Bioinform..

[2]  P S Kim,et al.  Context is a major determinant of beta-sheet propensity. , 1994, Nature.

[3]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[4]  B. Berger,et al.  betawrap: Successful prediction of parallel β-helices from primary sequence reveals an association with many microbial pathogens , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[6]  Pierre Baldi,et al.  Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners , 2002, ISMB.

[7]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[8]  Tanja Kortemme,et al.  Design of a 20-Amino Acid, Three-Stranded β-Sheet Protein , 1998 .

[9]  Alessandro Vullo,et al.  A two-stage approach for improved prediction of residue contact maps , 2006, BMC Bioinformatics.

[10]  Minoru Asogawa,et al.  Beta-Sheet Prediction Using Inter-Strand Residue Pairs and Refinement with Hopfield Neural Network , 1997, ISMB.

[11]  C Sander,et al.  Specific recognition in the tertiary structure of beta-sheets of proteins. , 1980, Journal of molecular biology.

[12]  Y. Mandel-Gutfreund,et al.  Contributions of residue pairing to beta-sheet formation: conservation and covariation of amino acid residue pairs on antiparallel beta-strands. , 2001, Journal of molecular biology.

[13]  K. Burrage,et al.  Protein contact prediction using patterns of correlation , 2004, Proteins.

[14]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[15]  M J Sternberg,et al.  On the conformation of proteins: the handedness of the connection between parallel beta-strands. , 1977, Journal of molecular biology.

[16]  L. Gregoret,et al.  Context-dependence of Amino Acid Residue Pairing in Antiparallel β-She?ets , 1999 .

[17]  J. Thornton,et al.  Determinants of strand register in antiparallel β‐sheets of proteins , 1998, Protein science : a publication of the Protein Society.

[18]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[19]  J. Thornton,et al.  Prediction of strand pairing in antiparallel and parallel β‐sheets using information theory , 2002, Proteins.

[20]  Pierre Baldi,et al.  Improved residue contact prediction using support vector machines and a large feature set , 2007, BMC Bioinformatics.

[21]  Richard Bonneau,et al.  Distributions of beta sheets in proteins with application to structure prediction , 2002, Proteins.

[22]  L. Regan,et al.  Guidelines for Protein Design: The Energetics of β Sheet Side Chain Interactions , 1995, Science.

[23]  Alessandro Vullo,et al.  Distill: a suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins , 2006, BMC Bioinformatics.

[24]  Taehyo Kim,et al.  Mean curvature as a major determinant of beta-sheet propensity. , 2006, Bioinformatics.

[25]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[26]  T. Hubbard,et al.  Fold recognition and ab initio structure predictions using hidden markov models and β‐strand pair potentials , 1995, Proteins.

[27]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[28]  Pierre Baldi,et al.  TMBpro : Secondary Structure , β-contact , and Tertiary Structure Prediction of Transmembrane β-Barrel Proteins , 2007 .

[29]  Peter Clote,et al.  Predicting transmembrane β‐barrels and interstrand residue interactions from sequence , 2006, Proteins.

[30]  Piotr Berman,et al.  Bringing Folding Pathways into Strand Pairing Prediction , 2007, WABI.

[31]  Matching Protein-Sheet Partners by Feedforward and Recurrent Neural Networks , 2000 .

[32]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[33]  Yanay Ofran,et al.  Prediction of Protein Structure Through Evolution , 2008 .

[34]  M. A. Wouters,et al.  An analysis of side chain interactions and pair correlations within antiparallel β‐sheets: The differences between backbone hydrogen‐bonded and non‐hydrogen‐bonded residue pairs , 1995, Proteins.

[35]  S H Kim,et al.  The anatomy of protein beta-sheet topology. , 2000, Journal of molecular biology.

[36]  Pierre Baldi,et al.  Three-stage prediction of protein ?-sheets by neural networks, alignments and graph algorithms , 2005, ISMB.

[37]  Robert M. MacCallum,et al.  Striped sheets and protein contact prediction , 2004, ISMB/ECCB.

[38]  L Regan,et al.  Modulating Protein Folding Rates in Vivo and in Vitro by Side-chain Interactions between the Parallel β Strands of Green Fluorescent Protein* , 2000, The Journal of Biological Chemistry.