Large-Scale Prediction of Disulphide Bond Connectivity

The formation of disulphide bridges among cysteines is an important feature of protein structures. Here we develop new methods for the prediction of disulphide bond connectivity. We first build a large curated data set of proteins containing disulphide bridges and then use 2-Dimensional Recursive Neural Networks to predict bonding probabilities between cysteine pairs. These probabilities in turn lead to a weighted graph matching problem that can be addressed efficiently. We show how the method consistently achieves better results than previous approaches on the same validation data. In addition, the method can easily cope with chains with arbitrary numbers of bonded cysteines. Therefore, it overcomes one of the major limitations of previous approaches restricting predictions to chains containing no more than 10 oxidized cysteines. The method can be applied both to situations where the bonded state of each cysteine is known or unknown, in which case bonded state can be predicted with 85% precision and 90% recall. The method also yields an estimate for the total number of disulphide bridges in each chain.

[1]  Harold N. Gabow,et al.  An Efficient Implementation of Edmonds' Algorithm for Maximum Matching on Graphs , 1976, JACM.

[2]  B. Matthews,et al.  Substantial increase of protein stability by multiple disulphide bonds , 1989, Nature.

[3]  S. Betz Disulfide bonds and the stability of globular proteins , 1993, Protein science : a publication of the Protein Society.

[4]  A. Fersht,et al.  Engineered disulfide bonds as probes of the folding pathway of barnase: increasing the stability of proteins against the rate of denaturation. , 1993, Biochemistry.

[5]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999 , 1999, Nucleic Acids Res..

[6]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[7]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[8]  E. Shakhnovich,et al.  What can disulfide bonds tell us about protein energetics, function and folding: simulations and bioninformatics analysis. , 2000, Journal of molecular biology.

[9]  Welker,et al.  Disulfide bonds and protein folding , 2000, Biochemistry.

[10]  H. Scheraga,et al.  Disulfide bonds and protein folding. , 2000, Biochemistry.

[11]  L. Demetrius Thermodynamics and evolution. , 2000, Journal of theoretical biology.

[12]  R. Raines,et al.  Contribution of disulfide bonds to the conformational stability and catalytic activity of ribonuclease A. , 2000, European journal of biochemistry.

[13]  Gianluca Pollastri,et al.  Machine Learning Structural and Functional Proteomics , 2001 .

[14]  Piero Fariselli,et al.  Prediction of disulfide connectivity in proteins , 2001, Bioinform..

[15]  L. Demetrius Thermodynamics and kinetics of protein folding: an evolutionary perspective. , 2002, Journal of theoretical biology.

[16]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[17]  John L. Klepeis,et al.  Prediction of β‐sheet topology and disulfide bridges in polypeptides , 2003, J. Comput. Chem..

[18]  Pierre Baldi,et al.  The Principled Design of Large-Scale Recursive Neural Network Architectures--DAG-RNNs and the Protein Structure Prediction Problem , 2003, J. Mach. Learn. Res..

[19]  Paolo Frasconi,et al.  Disulfide connectivity prediction using recursive neural networks and evolutionary information , 2004, Bioinform..

[20]  Pierre Baldi,et al.  On the relationship between deterministic and probabilistic directed Graphical models: From Bayesian networks to recursive neural networks , 2005, Neural Networks.