Context dependent reference states of solvent accessibility derived from native protein structures and assessed by predictability analysis

BackgroundSolvent accessibility (ASA) of amino acid residues is often transformed from absolute values of exposed surface area to their normalized relative values. This normalization is typically attained by assuming a highest exposure conformation based on extended state of that residue when it is surrounded by Ala or Gly on both sides i.e. Ala-X-Ala or Gly-X-Gly solvent exposed area. Exact sequence context, the folding state of the residues, and the actual environment of a folded protein, which do impose additional constraints on the highest possible (or highest observed) values of ASA, are currently ignored. Here, we analyze the statistics of these constraints and examine how the normalization of absolute ASA values using context-dependent Highest Observed ASA (HOA) instead of context-free extended state ASA (ESA) of residues can influence the performance of sequence-based prediction of solvent accessibility. Characterization of burial and exposed states of residues based on this normalization has also been shown to provide better enrichment of DNA-binding sites in exposed residues.ResultsWe compiled the statistics of highest observed ASA (HOA) of residues in their different contexts and analyzed their distribution in all 400 possible combinations for each residue type. We observe that many trippetides are more exposed than ESA and that HOA residues are often found in turn, coil and bend conformations. On the other hand several residues are never observed in an exposure state close to ESA values. A neural networks trained with HOA-normalized data outperforms the one trained with ESA-normalized values. However, the improvements are subtle in some residues, while they are more significant in others.ConclusionHOA based normalization of solvent accessibility from native structures is proposed and it shows improvement in sequence-based predictability, as well as enrichment in interface residues on surface. There may still be some difference between the highest possible ASA and highest observed ASA due to an insufficiently covered space of ASA distribution in the PDB, which limit the overall improvement in prediction to a relatively modest degree.

[1]  Aleksey A. Porollo,et al.  Prediction‐based fingerprints of protein–protein interactions , 2006, Proteins.

[2]  S. Pascarella,et al.  Improvement in prediction of solvent accessibility by probability profiles. , 2003, Protein engineering.

[3]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[4]  S Pascarella,et al.  Easy method to predict solvent accessibility from multiple protein sequence alignments , 1998, Proteins.

[5]  R A Goldstein,et al.  Predicting solvent accessibility: Higher accuracy using Bayesian statistics and optimized residue substitution classes , 1996, Proteins.

[6]  M. Oobatake,et al.  Hydration and heat stability effects on protein unfolding. , 1991, Progress in biophysics and molecular biology.

[7]  Aleksey A. Porollo,et al.  Accurate prediction of solvent accessibility using neural networks–based regression , 2004, Proteins.

[8]  O. Carugo,et al.  Predicting residue solvent accessibility from protein sequence by considering the sequence environment. , 2000, Protein engineering.

[9]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[10]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[11]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[12]  Jung-Ying Wang,et al.  Look‐up tables for protein solvent accessibility prediction and nearest neighbor effect analysis , 2004, Biopolymers.

[13]  P. Baldi,et al.  Prediction of coordination number and relative solvent accessibility in proteins , 2002, Proteins.

[14]  Pierre Tufféry,et al.  PredAcc: prediction of solvent accessibility , 1999, Bioinform..

[15]  Jagath C Rajapakse,et al.  Prediction of protein relative solvent accessibility with a two‐stage SVM approach , 2005, Proteins.

[16]  D J Barlow,et al.  The bottom line for prediction of residue solvent accessibility. , 1999, Protein engineering.

[17]  H Naderi-Manesh,et al.  Prediction of protein surface accessibility with information theory. , 2000, Proteins.

[18]  Haesun Park,et al.  Prediction of protein relative solvent accessibility with support vector machines and long‐range interaction 3D local descriptor , 2004, Proteins.

[19]  M. Gromiha,et al.  Real value prediction of solvent accessibility from amino acid sequence , 2003, Proteins.

[20]  Jung-Ying Wang,et al.  SVM‐Cabins: Prediction of solvent accessibility using accumulation cutoff set and support vector machine , 2007, Proteins.

[21]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[22]  Kevin Burrage,et al.  Prediction of protein solvent accessibility using support vector machines , 2002, Proteins.

[23]  Shandar Ahmad,et al.  NETASA: neural network based prediction of solvent accessibility , 2002, Bioinform..

[24]  M Michael Gromiha,et al.  Atom-wise statistics and prediction of solvent accessibility in proteins. , 2006, Biophysical chemistry.

[25]  Aleksey A. Porollo,et al.  Linear Regression Models for Solvent Accessibility Prediction in Proteins , 2005, J. Comput. Biol..

[26]  B. Rost,et al.  Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Zheng Yuan,et al.  Prediction of protein accessible surface areas by support vector regression , 2004, Proteins.

[28]  Xian-Ming Pan,et al.  New method for accurate prediction of solvent accessibility from protein sequence , 2001, Proteins.

[29]  Hahn-Ming Lee,et al.  Prediction and evolutionary information analysis of protein solvent accessibility using multiple linear regression , 2005, Proteins.

[30]  Mohd Firdaus Raih,et al.  Solvent accessibility in native and isolated domain environments: general features and implications to interface predictability. , 2005, Biophysical chemistry.

[31]  Shandar Ahmad,et al.  Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information , 2004, Bioinform..