Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins

MOTIVATION Solvent exposure of amino acid residues of proteins plays an important role in understanding and predicting protein structure, function and interactions. Solvent exposure can be characterized by several measures including solvent accessible surface area (ASA), residue depth (RD) and contact numbers (CN). More recently, an orientation-dependent contact number called half-sphere exposure (HSE) was introduced by separating the contacts within upper and down half spheres defined according to the Cα-Cβ (HSEβ) vector or neighboring Cα-Cα vectors (HSEα). HSEα calculated from protein structures was found to better describe the solvent exposure over ASA, CN and RD in many applications. Thus, a sequence-based prediction is desirable, as most proteins do not have experimentally determined structures. To our best knowledge, there is no method to predict HSEα and only one method to predict HSEβ. RESULTS This study developed a novel method for predicting both HSEα and HSEβ (SPIDER-HSE) that achieved a consistent performance for 10-fold cross validation and two independent tests. The correlation coefficients between predicted and measured HSEβ (0.73 for upper sphere, 0.69 for down sphere and 0.76 for contact numbers) for the independent test set of 1199 proteins are significantly higher than existing methods. Moreover, predicted HSEα has a higher correlation coefficient (0.46) to the stability change by residue mutants than predicted HSEβ (0.37) and ASA (0.43). The results, together with its easy Cα-atom-based calculation, highlight the potential usefulness of predicted HSEα for protein structure prediction and refinement as well as function prediction. AVAILABILITY AND IMPLEMENTATION The method is available at http://sparks-lab.org CONTACT yuedong.yang@griffith.edu.au or yaoqi.zhou@griffith.edu.au SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Akira R. Kinjo,et al.  CRNPRED: highly accurate prediction of one-dimensional protein structures by large-scale critical random networks , 2006, BMC Bioinformatics.

[2]  M. Gromiha,et al.  Real value prediction of solvent accessibility from amino acid sequence , 2003, Proteins.

[3]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[4]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[5]  Jiangning Song,et al.  HSEpred: predict half-sphere exposure from protein sequences , 2008, Bioinform..

[6]  Yaoqi Zhou,et al.  Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates , 2011, Bioinform..

[7]  Yuedong Yang,et al.  DDIG-in: discriminating between disease-associated and neutral non-frameshifting micro-indels , 2013, Genome Biology.

[8]  J. Skolnick,et al.  TOUCHSTONE: An ab initio protein structure prediction method that uses threading-based tertiary restraints , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[10]  D Gilis,et al.  Predicting protein stability changes upon mutation using database-derived potentials: solvent accessibility determines the importance of local versus non-local interactions along the sequence. , 1997, Journal of molecular biology.

[11]  Bo Jiang,et al.  Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Naïve Bayes , 2014, PloS one.

[12]  Tuo Zhang,et al.  Analysis and prediction of RNA-binding residues using sequence, evolutionary conservation, and predicted secondary structure and solvent accessibility. , 2010, Current protein & peptide science.

[13]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[14]  Akinori Sarai,et al.  ProTherm and ProNIT: thermodynamic databases for proteins and protein–nucleic acid interactions , 2005, Nucleic Acids Res..

[15]  Haiyan Liu,et al.  Genetic algorithms for protein conformation sampling and optimization in a discrete backbone dihedral angle space , 2006, J. Comput. Chem..

[16]  Zheng Yuan,et al.  Better prediction of protein contact number using a support vector regression analysis of amino acid sequence , 2005, BMC Bioinformatics.

[17]  Rasmus Berg Palm,et al.  Prediction as a candidate for learning deep hierarchical models of data , 2012 .

[18]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[19]  Sitao Wu,et al.  MUSTER: Improving protein sequence profile–profile alignments by using multiple sources of structure information , 2008, Proteins.

[20]  James G. Lyons,et al.  Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning , 2015, Scientific Reports.

[21]  P. Baldi,et al.  Prediction of coordination number and relative solvent accessibility in proteins , 2002, Proteins.

[22]  Dong Xu,et al.  DL-PRO: A novel deep learning method for protein model quality assessment , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[23]  Zheng Yuan,et al.  Prediction of protein accessible surface areas by support vector regression , 2004, Proteins.

[24]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[25]  T. Hamelryck An amino acid has two sides: A new 2D measure provides a different view of solvent exposure , 2005, Proteins.

[26]  Lukasz A. Kurgan,et al.  Accurate sequence-based prediction of catalytic residues , 2008, Bioinform..

[27]  Michael J E Sternberg,et al.  Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre , 2008, Proteins.

[28]  M. L. Connolly Solvent-accessible surfaces of proteins and nucleic acids. , 1983, Science.

[29]  Yuedong Yang,et al.  Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. , 2009, Structure.

[30]  Morten Nielsen,et al.  Reliable B Cell Epitope Predictions: Impacts of Method Development and Improved Benchmarking , 2012, PLoS Comput. Biol..

[31]  Yaoqi Zhou,et al.  Real‐SPINE: An integrated system of neural networks for real‐value prediction of protein structural properties , 2007, Proteins.

[32]  Ozlem Keskin,et al.  Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy , 2009, Bioinform..

[33]  Pierre Baldi,et al.  PEPITO: improved discontinuous B-cell epitope prediction using multiple distance thresholds and half sphere exposure , 2008, Bioinform..

[34]  Jens Meiler,et al.  Rosetta predictions in CASP5: Successes, failures, and prospects for complete automation , 2003, Proteins.

[35]  Yuedong Yang,et al.  Highly accurate and high-resolution function prediction of RNA binding proteins by fold recognition and binding affinity prediction , 2011, RNA biology.

[36]  R. Varadarajan,et al.  Residue depth: a novel parameter for the analysis of protein structure and stability. , 1999, Structure.

[37]  Kuldip K. Paliwal,et al.  Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto‐encoder deep neural network , 2014, J. Comput. Chem..

[38]  Aleksey Porollo,et al.  PROTEINS: Structure, Function, and Bioinformatics 56:753–767 (2004) Accurate Prediction of Solvent Accessibility Using Neural Networks–Based Regression , 2022 .

[39]  Yu Xia,et al.  Structural determinants of protein evolution are context-sensitive at the residue level. , 2009, Molecular biology and evolution.

[40]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[41]  Kevin Karplus,et al.  Evaluation of local structure alphabets based on residue burial , 2004, Proteins.

[42]  Harpreet Kaur,et al.  Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure , 2005, Proteins.