Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates

MOTIVATION In recent years, development of a single-method fold-recognition server lags behind consensus and multiple template techniques. However, a good consensus prediction relies on the accuracy of individual methods. This article reports our efforts to further improve a single-method fold recognition technique called SPARKS by changing the alignment scoring function and incorporating the SPINE-X techniques that make improved prediction of secondary structure, backbone torsion angle and solvent accessible surface area. RESULTS The new method called SPARKS-X was tested with the SALIGN benchmark for alignment accuracy, Lindahl and SCOP benchmarks for fold recognition, and CASP 9 blind test for structure prediction. The method is compared to several state-of-the-art techniques such as HHPRED and BoostThreader. Results show that SPARKS-X is one of the best single-method fold recognition techniques. We further note that incorporating multiple templates and refinement in model building will likely further improve SPARKS-X. AVAILABILITY The method is available as a SPARKS-X server at http://sparks.informatics.iupui.edu/

[1]  Jian Peng,et al.  Boosting Protein Threading Accuracy , 2009, RECOMB.

[2]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[3]  M. Karplus,et al.  Evaluation of comparative protein modeling by MODELLER , 1995, Proteins.

[4]  Jian Peng,et al.  Low-homology protein threading , 2010, Bioinform..

[5]  William H. Press,et al.  Numerical Recipes in C, 2nd Edition , 1992 .

[6]  Saraswathi Vishveshwara,et al.  A graph spectral analysis of the structural similarity network of protein chains , 2005, Proteins.

[7]  Yaoqi Zhou,et al.  DDOMAIN: Dividing structures into domains using a normalized domain–domain interaction profile , 2007, Protein science : a publication of the Protein Society.

[8]  William H. Press,et al.  Numerical recipes in C , 2002 .

[9]  Yaoqi Zhou,et al.  SPARKS 2 and SP3 servers in CASP6 , 2005, Proteins.

[10]  E. Lindahl,et al.  Identification of related proteins on family, superfamily and fold level. , 2000, Journal of molecular biology.

[11]  Lars Malmström,et al.  Automated prediction of CASP‐5 structures using the Robetta server , 2003, Proteins.

[12]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[13]  Hongyi Zhou,et al.  Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction , 2002, Protein science : a publication of the Protein Society.

[14]  Seung Yup Lee,et al.  Analysis of TASSER‐based CASP7 protein structure prediction results , 2007, Proteins.

[15]  Yaoqi Zhou,et al.  Improving the prediction accuracy of residue solvent accessibility and real‐value backbone torsion angles of proteins by guided‐learning through a two‐layer neural network , 2009, Proteins.

[16]  Arne Elofsson,et al.  Pcons.net: protein structure prediction meta server , 2007, Nucleic Acids Res..

[17]  Genki Terashi,et al.  Fams‐ace: A combined method to select the best model after remodeling all server models , 2007, Proteins.

[18]  Sitao Wu,et al.  MUSTER: Improving protein sequence profile–profile alignments by using multiple sources of structure information , 2008, Proteins.

[19]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[20]  Dong Xu,et al.  PROSPECT II: protein structure prediction program for genome-scale applications. , 2003, Protein engineering.

[21]  A. Elofsson,et al.  Hidden Markov models that use predicted secondary structures for fold recognition , 1999, Proteins.

[22]  Yaoqi Zhou,et al.  Achieving 80% ten‐fold cross‐validated accuracy for secondary structure prediction by large‐scale training , 2006, Proteins.

[23]  Randy J Read,et al.  Automated server predictions in CASP7 , 2007, Proteins.

[24]  Hongyi Zhou,et al.  Single‐body residue‐level knowledge‐based energy score combined with sequence‐profile and secondary structure information for fold recognition , 2004, Proteins.

[25]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[26]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[27]  Yuedong Yang,et al.  Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. , 2009, Structure.

[28]  J. Skolnick,et al.  The PDB is a covering set of small protein structures. , 2003, Journal of molecular biology.

[29]  Yaoqi Zhou,et al.  Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all‐atom statistical energy functions , 2008, Protein science : a publication of the Protein Society.

[30]  B. Rost,et al.  Protein fold recognition by prediction-based threading. , 1997, Journal of molecular biology.

[31]  Michael J E Sternberg,et al.  Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre , 2008, Proteins.

[32]  M J Sippl,et al.  Structure-based evaluation of sequence comparison and fold recognition alignment accuracy. , 2000, Journal of molecular biology.

[33]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[34]  J. Bujnicki Protein-Structure Prediction by Recombination of Fragments , 2006 .

[35]  Yang Zhang,et al.  Template‐based modeling and free modeling by I‐TASSER in CASP7 , 2007, Proteins.

[36]  A. Sali,et al.  Alignment of protein sequences by their profiles , 2004, Protein science : a publication of the Protein Society.

[37]  J. Skolnick,et al.  On the origin and highly likely completeness of single-domain protein structures. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[38]  David T. Jones,et al.  pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination , 2009, Bioinform..

[39]  Lukasz A. Kurgan,et al.  SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles , 2012, J. Comput. Chem..

[40]  Hongyi Zhou,et al.  Fold recognition by combining sequence profiles derived from evolution and from depth‐dependent structural alignment of fragments , 2004, Proteins.

[41]  Jeffrey Skolnick,et al.  Improving threading algorithms for remote homology modeling by combining fragment and template comparisons , 2010, Proteins.

[42]  Martin Madera,et al.  Profile Comparer: a program for scoring and aligning profile hidden Markov models , 2008, Bioinform..

[43]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[44]  Song Liu,et al.  Fold recognition by concurrent use of solvent accessibility and residue depth , 2007, Proteins.

[45]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[46]  Pierre Baldi,et al.  A machine learning information retrieval approach to protein fold recognition. , 2006, Bioinformatics.

[47]  Y. Duan,et al.  Trends in template/fragment-free protein structure prediction , 2010, Theoretical chemistry accounts.

[48]  Alfonso Valencia,et al.  Assessment of predictions submitted for the CASP6 comparative modeling category , 2005, Proteins.

[49]  Jinbo Xu Protein Structure Prediction by Linear Programming , 2003 .

[50]  Yaoqi Zhou,et al.  Characterizing the existing and potential structural space of proteins by large-scale multiple loop permutations. , 2011, Journal of molecular biology.