Quantifying the relationship of protein burying depth and sequence

Protein burying depth (BD) is a structural descriptor that is exploited not only to find whether a residue is exposed or buried, but also to determine how deep a residue is buried. The widely used solvent accessible surface area is mainly focusing on the study of protein surface residues, while protein BD can provide more detailed information about the arrangement of buried residues, which may be used to study protein deep level structure and the formation of protein folding nucleus. In this work, we analyse the relationship of protein BD and sequences, and describe it by nonlinear functions estimated by support vector machines. We examine the functions by crossvalidation tests and find strong correlation between residue BD and local sequence environment. By further taking account the size of the molecule where a residue is located, we find that the correlation coefficient between predicted and observed depths improves from 0.60 to 0.65. Moreover, nearly half of the deepest 10% residues in a protein sequence can be correctly predicted. Our study suggests that a residue's burying extent is able to be predicted, to some degree, by itself and its local neighbouring residues. The methods used to estimate the sequence‐depth functions are expected to become more useful in the investigation of protein structures and folding mechanism. Proteins 2008. © 2007 Wiley‐Liss, Inc.

[1]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[2]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[3]  Veronica Rotemberg,et al.  CoC: a database of universally conserved residues in protein folds , 2005, Bioinform..

[4]  M. Gromiha,et al.  Real value prediction of solvent accessibility from amino acid sequence , 2003, Proteins.

[5]  R A Goldstein,et al.  Predicting solvent accessibility: Higher accuracy using Bayesian statistics and optimized residue substitution classes , 1996, Proteins.

[6]  M. Sanner,et al.  Reduced surface: an efficient way to compute molecular surfaces. , 1996, Biopolymers.

[7]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[8]  Andrea Bernini,et al.  Three-dimensional computation of atom depth in complex molecular structures , 2005, Bioinform..

[9]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[10]  Jagath C Rajapakse,et al.  Prediction of protein relative solvent accessibility with a two‐stage SVM approach , 2005, Proteins.

[11]  O. Carugo,et al.  Predicting residue solvent accessibility from protein sequence by considering the sequence environment. , 2000, Protein engineering.

[12]  Zheng Yuan,et al.  Prediction of protein B‐factor profiles , 2005, Proteins.

[13]  Yaoqi Zhou,et al.  QBES: Predicting real values of solvent accessibility from sequences by efficient, constrained energy optimization , 2006, Proteins.

[14]  Oliviero Carugo,et al.  Atom depth as a descriptor of the protein interior. , 2003, Biophysical journal.

[15]  J. Briggs,et al.  Structure-based drug design: computational advances. , 1997, Annual review of pharmacology and toxicology.

[16]  Sándor Pongor,et al.  The “first in–last out” hypothesis on protein folding revisited , 2005, Proteins.

[17]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[18]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[19]  Zheng Yuan,et al.  Prediction of protein accessible surface areas by support vector regression , 2004, Proteins.

[20]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[21]  Hongyi Zhou,et al.  Single‐body residue‐level knowledge‐based energy score combined with sequence‐profile and secondary structure information for fold recognition , 2004, Proteins.

[22]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[23]  Zheng Yuan,et al.  Better prediction of protein contact number using a support vector regression analysis of amino acid sequence , 2005, BMC Bioinformatics.

[24]  Tamotsu Noguchi,et al.  PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB) in 2003 , 2003, Nucleic Acids Res..

[25]  M. Gromiha,et al.  Importance of long-range interactions in protein folding. , 1999, Biophysical chemistry.

[26]  M Michael Gromiha,et al.  Inter-residue interactions in protein folding and stability. , 2004, Progress in biophysics and molecular biology.

[27]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[28]  S. Pascarella,et al.  Improvement in prediction of solvent accessibility by probability profiles. , 2003, Protein engineering.

[29]  F M Poulsen,et al.  A nuclear magnetic resonance study of the hydrogen-exchange behaviour of lysozyme in crystals and solution. , 1991, Journal of molecular biology.

[30]  Jagath C Rajapakse,et al.  Two‐stage support vector regression approach for predicting accessible surface areas of amino acids , 2006, Proteins.

[31]  Gail J. Bartlett,et al.  Using a neural network and spatial clustering to predict the location of active sites in enzymes. , 2003, Journal of molecular biology.

[32]  Hahn-Ming Lee,et al.  Prediction and evolutionary information analysis of protein solvent accessibility using multiple linear regression , 2005, Proteins.

[33]  T. Hamelryck An amino acid has two sides: A new 2D measure provides a different view of solvent exposure , 2005, Proteins.

[34]  M Michael Gromiha,et al.  Atom-wise statistics and prediction of solvent accessibility in proteins. , 2006, Biophysical chemistry.

[35]  R. Varadarajan,et al.  Residue depth: a novel parameter for the analysis of protein structure and stability. , 1999, Structure.

[36]  Pinak Chakrabarti,et al.  Quantifying the accessible surface area of protein residues in their local environment. , 2002, Protein engineering.

[37]  Oliviero Carugo,et al.  Atom depth in protein structure and function. , 2003, Trends in biochemical sciences.

[38]  Mikael Bodén,et al.  Predicting the solvent accessibility of transmembrane residues from protein sequence. , 2006, Journal of proteome research.