Predicting absolute contact numbers of native protein structure from amino acid sequence

The contact number of an amino acid residue in a protein structure is defined by the number of Cβ atoms around the Cβ atom of the given residue, a quantity similar to, but different from, solvent accessible surface area. We present a method to predict the contact numbers of a protein from its amino acid sequence. The method is based on a simple linear regression scheme and predicts the absolute values of contact numbers. When single sequences are used for both parameter estimation and cross‐validation, the present method predicts the contact numbers with a correlation coefficient of 0.555 on average. When multiple sequence alignments are used, the correlation increases to 0.627, which is a significant improvement over previous methods. In terms of discrete states prediction, the accuracies for 2‐, 3‐, and 10‐state predictions are, respectively, 71.4%, 54.1%, and 18.9% with residue type‐dependent unbiased thresholds, and 76.3%, 59.2%, and 21.8% with residue type‐independent unbiased thresholds. The difference between accessible surface area and contact number from a prediction viewpoint and the application of contact number prediction to three‐dimensional structure prediction are discussed. Proteins 2005. © 2004 Wiley‐Liss, Inc.

[1]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[2]  K Nishikawa,et al.  Prediction of the surface-interior diagram of globular proteins by an empirical method. , 2009, International journal of peptide and protein research.

[3]  N. Go Theoretical studies of protein folding. , 1983, Annual review of biophysics and bioengineering.

[4]  T. Richmond,et al.  Solvent accessible surface area and excluded volume in proteins. Analytical equations for overlapping spheres and implications for the hydrophobic effect. , 1984, Journal of molecular biology.

[5]  K. Nishikawa,et al.  Radial locations of amino acid residues in a globular protein: correlation with the sequence. , 1986, Journal of biochemistry.

[6]  S H Kim,et al.  Predicting surface exposure of amino acids from protein sequence. , 1990, Protein engineering.

[7]  Chris Sander Databases of homology-derived protein structures , 1990 .

[8]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[9]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[10]  D. Eisenberg,et al.  Atomic solvation parameters applied to molecular dynamics of proteins in solution , 1992, Protein science : a publication of the Protein Society.

[11]  H Nakamura,et al.  Intrinsic nature of the three-dimensional structure of proteins as determined by distance geometry with good sampling properties , 1993, Journal of biomolecular NMR.

[12]  Y. Matsuo,et al.  Development of pseudoenergy potentials for assessing protein 3-D-1-D compatibility and detecting weak homologies. , 1993, Protein engineering.

[13]  K Nishikawa,et al.  A geometrical constraint approach for reproducing the native backbone conformation of a protein , 1993, Proteins.

[14]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[15]  Burkhard Rost,et al.  TOPITS: Threading One-Dimensional Predictions Into Three-Dimensional Structures , 1995, ISMB.

[16]  R A Goldstein,et al.  Predicting solvent accessibility: Higher accuracy using Bayesian statistics and optimized residue substitution classes , 1996, Proteins.

[17]  K Nishikawa,et al.  Assessment of pseudo-energy potentials by the best-five test: a new use of the three-dimensional profiles of proteins. , 1997, Protein engineering.

[18]  D J Barlow,et al.  The bottom line for prediction of residue solvent accessibility. , 1999, Protein engineering.

[19]  S. Takada,et al.  Go-ing for the prediction of protein folding mechanisms. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[20]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[21]  K. Nishikawa,et al.  Physicochemical evaluation of protein folds predicted by threading , 2000, European Biophysics Journal.

[22]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[23]  Piero Fariselli,et al.  Prediction of the Number of Residue Contacts in Proteins , 2000, ISMB.

[24]  O. Carugo,et al.  Predicting residue solvent accessibility from protein sequence by considering the sequence environment. , 2000, Protein engineering.

[25]  M. Karplus,et al.  Three key residues form a critical contact network in a protein folding transition state , 2001, Nature.

[26]  Hideo Matsuda,et al.  PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB) , 2001, Nucleic Acids Res..

[27]  Xian-Ming Pan,et al.  New method for accurate prediction of solvent accessibility from protein sequence , 2001, Proteins.

[28]  Kevin Burrage,et al.  Prediction of protein solvent accessibility using support vector machines , 2002, Proteins.

[29]  M Vendruscolo,et al.  Statistical properties of contact vectors. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  B. Rost,et al.  Alignments grow, secondary structure prediction improves , 2002, Proteins.

[31]  P. Baldi,et al.  Prediction of coordination number and relative solvent accessibility in proteins , 2002, Proteins.

[32]  Burkhard Rost,et al.  Prediction in 1D: secondary structure, membrane helices, and accessibility. , 2003, Methods of biochemical analysis.

[33]  S. Pascarella,et al.  Improvement in prediction of solvent accessibility by probability profiles. , 2003, Protein engineering.

[34]  M. Gromiha,et al.  Real value prediction of solvent accessibility from amino acid sequence , 2003, Proteins.

[35]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .