Real value prediction of protein solvent accessibility using enhanced PSSM features

BackgroundPrediction of protein solvent accessibility, also called accessible surface area (ASA) prediction, is an important step for tertiary structure prediction directly from one-dimensional sequences. Traditionally, predicting solvent accessibility is regarded as either a two- (exposed or buried) or three-state (exposed, intermediate or buried) classification problem. However, the states of solvent accessibility are not well-defined in real protein structures. Thus, a number of methods have been developed to directly predict the real value ASA based on evolutionary information such as position specific scoring matrix (PSSM).ResultsThis study enhances the PSSM-based features for real value ASA prediction by considering the physicochemical properties and solvent propensities of amino acid types. We propose a systematic method for identifying residue groups with respect to protein solvent accessibility. The amino acid columns in the PSSM profile that belong to a certain residue group are merged to generate novel features. Finally, support vector regression (SVR) is adopted to construct a real value ASA predictor. Experimental results demonstrate that the features produced by the proposed selection process are informative for ASA prediction.ConclusionExperimental results based on a widely used benchmark reveal that the proposed method performs best among several of existing packages for performing ASA prediction. Furthermore, the feature selection mechanism incorporated in this study can be applied to other regression problems using the PSSM. The program and data are available from the authors upon request.

[1]  Piero Fariselli,et al.  RCNPRED: prediction of the residue co-ordination numbers in proteins , 2001, Bioinform..

[2]  Harpreet Kaur,et al.  Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure , 2005, Proteins.

[3]  Pierre Tufféry,et al.  PredAcc: prediction of solvent accessibility , 1999, Bioinform..

[4]  O. Carugo,et al.  Predicting residue solvent accessibility from protein sequence by considering the sequence environment. , 2000, Protein engineering.

[5]  Hahn-Ming Lee,et al.  Prediction and evolutionary information analysis of protein solvent accessibility using multiple linear regression , 2005, Proteins.

[6]  Xian-Ming Pan,et al.  New method for accurate prediction of solvent accessibility from protein sequence , 2001, Proteins.

[7]  Kevin Burrage,et al.  Prediction of protein solvent accessibility using support vector machines , 2002, Proteins.

[8]  S H Kim,et al.  Predicting surface exposure of amino acids from protein sequence. , 1990, Protein engineering.

[9]  Zheng Yuan,et al.  Prediction of protein accessible surface areas by support vector regression , 2004, Proteins.

[10]  K. Dill,et al.  Origins of structure in globular proteins. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Jagath C Rajapakse,et al.  Prediction of protein relative solvent accessibility with a two‐stage SVM approach , 2005, Proteins.

[12]  H Naderi-Manesh,et al.  Prediction of protein surface accessibility with information theory. , 2000, Proteins.

[13]  M. Gromiha,et al.  Real value prediction of solvent accessibility from amino acid sequence , 2003, Proteins.

[14]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[15]  Haesun Park,et al.  Prediction of protein relative solvent accessibility with support vector machines and long‐range interaction 3D local descriptor , 2004, Proteins.

[16]  S Pascarella,et al.  Easy method to predict solvent accessibility from multiple protein sequence alignments , 1998, Proteins.

[17]  Shandar Ahmad,et al.  NETASA: neural network based prediction of solvent accessibility , 2002, Bioinform..

[18]  Frank Eisenhaber,et al.  Improved strategy in analytic surface calculation for molecular systems: Handling of singularities and computational efficiency , 1993, J. Comput. Chem..

[19]  S. Pascarella,et al.  Improvement in prediction of solvent accessibility by probability profiles. , 2003, Protein engineering.

[20]  P. Baldi,et al.  Prediction of coordination number and relative solvent accessibility in proteins , 2002, Proteins.

[21]  William J. Welsh,et al.  Improved method for predicting ?-turn using support vector machine , 2005, Bioinform..

[22]  R A Goldstein,et al.  Predicting solvent accessibility: Higher accuracy using Bayesian statistics and optimized residue substitution classes , 1996, Proteins.

[23]  Jagath C Rajapakse,et al.  Two‐stage support vector regression approach for predicting accessible surface areas of amino acids , 2006, Proteins.

[24]  H. Scheraga,et al.  Accessible surface areas as a measure of the thermodynamic parameters of hydration of peptides. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Yu-Yen Ou,et al.  Protein disorder prediction by condensed PSSM considering propensity for order or disorder , 2006, BMC Bioinformatics.

[26]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[27]  Aleksey A. Porollo,et al.  Accurate prediction of solvent accessibility using neural networks–based regression , 2004, Proteins.

[28]  Mohd Firdaus Raih,et al.  Solvent accessibility in native and isolated domain environments: general features and implications to interface predictability. , 2005, Biophysical chemistry.

[29]  D J Barlow,et al.  The bottom line for prediction of residue solvent accessibility. , 1999, Protein engineering.

[30]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[31]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[32]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[33]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[34]  David T. Jones,et al.  Getting the most from PSI-BLAST. , 2002, Trends in biochemical sciences.

[35]  William J Welsh,et al.  Improved method for predicting beta-turn using support vector machine. , 2005, Bioinformatics.

[36]  Yoichi Muraoka,et al.  Predicting the protein disordered region using modified position specific scoring matrix , 2004 .