AcconPred: Predicting Solvent Accessibility and Contact Number Simultaneously by a Multitask Learning Framework under the Conditional Neural Fields Model

Motivation. The solvent accessibility of protein residues is one of the driving forces of protein folding, while the contact number of protein residues limits the possibilities of protein conformations. The de novo prediction of these properties from protein sequence is important for the study of protein structure and function. Although these two properties are certainly related with each other, it is challenging to exploit this dependency for the prediction. Method. We present a method AcconPred for predicting solvent accessibility and contact number simultaneously, which is based on a shared weight multitask learning framework under the CNF (conditional neural fields) model. The multitask learning framework on a collection of related tasks provides more accurate prediction than the framework trained only on a single task. The CNF method not only models the complex relationship between the input features and the predicted labels, but also exploits the interdependency among adjacent labels. Results. Trained on 5729 monomeric soluble globular protein datasets, AcconPred could reach 0.68 three-state accuracy for solvent accessibility and 0.75 correlation for contact number. Tested on the 105 CASP11 domain datasets for solvent accessibility, AcconPred could reach 0.64 accuracy, which outperforms existing methods.

[1]  K. Dill Dominant forces in protein folding. , 1990, Biochemistry.

[2]  Aleksey A. Porollo,et al.  Combining prediction of secondary structure and solvent accessibility in proteins , 2005, Proteins.

[3]  Jianlin Cheng,et al.  NNcon: improved protein contact map prediction using 2D-recursive neural networks , 2009, Nucleic Acids Res..

[4]  Pascal Benkert,et al.  QMEAN server for protein model quality estimation , 2009, Nucleic Acids Res..

[5]  Harpreet Kaur,et al.  Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure , 2005, Proteins.

[6]  Inbal Budowski-Tal,et al.  FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately , 2010, Proceedings of the National Academy of Sciences.

[7]  M Vendruscolo,et al.  Statistical properties of contact vectors. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  A. Bax,et al.  TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts , 2009, Journal of biomolecular NMR.

[9]  Hongliang Fei,et al.  Structured Feature Selection and Task Relationship Inference for Multi-task Learning , 2011, ICDM.

[10]  Jian Peng,et al.  Conditional Neural Fields , 2009, NIPS.

[11]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[12]  M. Gromiha,et al.  Real value prediction of solvent accessibility from amino acid sequence , 2003, Proteins.

[13]  Feng Zhao,et al.  Protein threading using context-specific alignment potential , 2013, Bioinform..

[14]  Zheng Wei-Mou,et al.  Fast Multiple Alignment of Protein Structures Using Conformational Letter Blocks , 2009 .

[15]  Min Huang,et al.  Position‐specific residue preference features around the ends of helices and strands and a novel strategy for the prediction of secondary structures , 2008, Protein science : a publication of the Protein Society.

[16]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[17]  Xin Liu,et al.  A Protein Structural Alphabet and Its Substitution Matrix CLESUM , 2004, Trans. Comp. Sys. Biology.

[18]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[19]  Yaoqi Zhou,et al.  Improving the prediction accuracy of residue solvent accessibility and real‐value backbone torsion angles of proteins by guided‐learning through a two‐layer neural network , 2009, Proteins.

[20]  Sheng Wang,et al.  ClEPaps: Fast Pair Alignment of protein Structures Based on conformational Letters , 2007, J. Bioinform. Comput. Biol..

[21]  W. Kauzmann Some factors in the interpretation of protein denaturation. , 1959, Advances in protein chemistry.

[22]  Zhiyong Wang,et al.  MRFalign: Protein Homology Detection through Alignment of Markov Random Fields , 2014, PLoS Comput. Biol..

[23]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[24]  Ya Zhang,et al.  Multi-task learning for boosting with application to web search ranking , 2010, KDD.

[25]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[26]  Gesine Reinert,et al.  Local Network Patterns in Protein-Protein Interfaces , 2013, PloS one.

[27]  Jianzhu Ma,et al.  Algorithms, applications, and challenges of protein structure alignment. , 2014, Advances in protein chemistry and structural biology.

[28]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[29]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[30]  Aleksey A. Porollo,et al.  Accurate prediction of solvent accessibility using neural networks–based regression , 2004, Proteins.

[31]  Jian Peng,et al.  A conditional neural fields model for protein threading , 2012, Bioinform..

[32]  Jianzhu Ma,et al.  RaptorX server: a resource for template-based protein structure modeling. , 2014, Methods in molecular biology.

[33]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[34]  Kevin Burrage,et al.  Prediction of protein solvent accessibility using support vector machines , 2002, Proteins.

[35]  A. Biegert,et al.  Sequence context-specific profiles for homology searching , 2009, Proceedings of the National Academy of Sciences.

[36]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[37]  Zheng Yuan,et al.  Prediction of protein accessible surface areas by support vector regression , 2004, Proteins.

[38]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[39]  M Vendruscolo,et al.  Protein folding using contact maps. , 1999, Vitamins and hormones.

[40]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[41]  Jinbo Xu,et al.  A position-specific distance-dependent statistical potential for protein structure and functional study. , 2012, Structure.

[42]  Jieping Ye,et al.  Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks , 2010, TKDD.

[43]  C. Chothia The nature of the accessible and buried surfaces in proteins. , 1976, Journal of molecular biology.

[44]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[45]  Jie Liang,et al.  Estimation of amino acid residue substitution rates at local spatial regions and application in protein function inference: a Bayesian Monte Carlo approach. , 2006, Molecular biology and evolution.

[46]  R C Wade,et al.  Prediction of protein hydration sites from sequence by modular neural networks. , 1998, Protein engineering.

[47]  J. Skolnick,et al.  Erratum: Scoring function for automated assessment of protein structure template quality (Proteins: Structure, Function and Genetics (2004) 57, (702-710)) , 2007 .

[48]  Yen Hock Tan,et al.  Statistical potential‐based amino acid similarity matrices for aligning distantly related protein sequences , 2006, Proteins.

[49]  Jagath C Rajapakse,et al.  Prediction of protein relative solvent accessibility with a two‐stage SVM approach , 2005, Proteins.

[50]  T L Blundell,et al.  A database of globular protein structural domains: clustering of representative family members into similar folds. , 1996, Folding & design.

[51]  Jianzhu Ma,et al.  Protein structure alignment beyond spatial proximity , 2013, Scientific Reports.

[52]  Dániel Kozma,et al.  PDBTM: Protein Data Bank of transmembrane proteins after 8 years , 2012, Nucleic Acids Res..

[53]  Pierre Baldi,et al.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity , 2014, Bioinform..

[54]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[55]  S H Kim,et al.  Predicting surface exposure of amino acids from protein sequence. , 1990, Protein engineering.

[56]  Jens Meiler,et al.  Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks , 2001 .

[57]  Jian Peng,et al.  Alignment of distantly related protein structures: algorithm, bound and implications to homology modeling , 2011, Bioinform..

[58]  Feng Zhao,et al.  Fragment-free approach to protein folding using conditional neural fields , 2010, Bioinform..

[59]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[60]  Seung-Yeon Kim,et al.  Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method , 2005, Bioinform..

[61]  Keehyoung Joo,et al.  proteins STRUCTURE O FUNCTION O BIOINFORMATICS SANN: Solvent accessibility prediction of proteins , 2022 .

[62]  Yang Zhang,et al.  How significant is a protein structure similarity with TM-score = 0.5? , 2010, Bioinform..

[63]  Eytan Domany,et al.  Protein folding in contact map space , 2000 .

[64]  Haesun Park,et al.  Prediction of protein relative solvent accessibility with support vector machines and long‐range interaction 3D local descriptor , 2004, Proteins.

[65]  Jianlin Cheng,et al.  Prediction of global and local quality of CASP8 models by MULTICOM series , 2009, Proteins.

[66]  C. Etchebest,et al.  Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks , 2000, Proteins.

[67]  Zheng Yuan,et al.  Better prediction of protein contact number using a support vector regression analysis of amino acid sequence , 2005, BMC Bioinformatics.

[68]  SödingJohannes Protein homology detection by HMM--HMM comparison , 2005 .

[69]  K. Nishikawa,et al.  Predicting absolute contact numbers of native protein structure from amino acid sequence , 2004, Proteins.

[70]  B Honig,et al.  Extracting hydrophobic free energies from experimental data: relationship to protein folding and theoretical models. , 1991, Biochemistry.

[71]  Hongliang Fei,et al.  Structured feature selection and task relationship inference for multi-task learning , 2011, 2011 IEEE 11th International Conference on Data Mining.

[72]  Jian Peng,et al.  Template-based protein structure modeling using the RaptorX web server , 2012, Nature Protocols.

[73]  Inna Dubchak,et al.  An Integrative Computational Approach for Prioritization of Genomic Variants , 2014, PloS one.

[74]  P. Baldi,et al.  Prediction of coordination number and relative solvent accessibility in proteins , 2002, Proteins.

[75]  Vasant Honavar,et al.  Predicting protein-protein interface residues using local surface structural similarity , 2012, BMC Bioinformatics.

[76]  Feng Zhao,et al.  Protein 8-class secondary structure prediction using Conditional Neural Fields , 2010, BIBM.

[77]  David C. Jones,et al.  Assessing the impact of secondary structure and solvent accessibility on protein evolution. , 1998, Genetics.

[78]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[79]  John Moult,et al.  A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. , 2005, Current opinion in structural biology.

[80]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[81]  G. Rose,et al.  Hydrophobicity of amino acid residues in globular proteins. , 1985, Science.

[82]  R A Goldstein,et al.  Predicting solvent accessibility: Higher accuracy using Bayesian statistics and optimized residue substitution classes , 1996, Proteins.

[83]  Alexander McPherson,et al.  Advances in Protein Chemistry and Structural Biology , 2010, Advances in Protein Chemistry and Structural Biology.

[84]  G J Kleywegt,et al.  Phi/psi-chology: Ramachandran revisited. , 1996, Structure.

[85]  Shandar Ahmad,et al.  NETASA: neural network based prediction of solvent accessibility , 2002, Bioinform..

[86]  Yanjun Qi,et al.  A Unified Multitask Architecture for Predicting Local Protein Properties , 2012, PloS one.

[87]  C. Chothia Structural invariants in protein folding , 1975, Nature.