Low-homology protein threading

Motivation: The challenge of template-based modeling lies in the recognition of correct templates and generation of accurate sequence-template alignments. Homologous information has proved to be very powerful in detecting remote homologs, as demonstrated by the state-of-the-art profile-based method HHpred. However, HHpred does not fare well when proteins under consideration are low-homology. A protein is low-homology if we cannot obtain sufficient amount of homologous information for it from existing protein sequence databases. Results: We present a profile-entropy dependent scoring function for low-homology protein threading. This method will model correlation among various protein features and determine their relative importance according to the amount of homologous information available. When proteins under consideration are low-homology, our method will rely more on structure information; otherwise, homologous information. Experimental results indicate that our threading method greatly outperforms the best profile-based method HHpred and all the top CASP8 servers on low-homology proteins. Tested on the CASP8 hard targets, our threading method is also better than all the top CASP8 servers but slightly worse than Zhang-Server. This is significant considering that Zhang-Server and other top CASP8 servers use a combination of multiple structure-prediction techniques including consensus method, multiple-template modeling, template-free modeling and model refinement while our method is a classical single-template-based threading method without any post-threading refinement. Contact: jinboxu@gmail.com

[1]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[2]  J Skolnick,et al.  Defrosting the frozen approximation: PROSPECTOR— A new approach to threading , 2001, Proteins.

[3]  Ying Xu,et al.  Improvement in protein sequence-structure alignment using insertion/deletion frequency arrays. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[4]  Serafim Batzoglou,et al.  CONTRAlign: Discriminative Training for Protein Sequence Alignment , 2006, RECOMB.

[5]  Jacquelyn S. Fetrow,et al.  Structural genomics and its importance for gene function analysis , 2000, Nature Biotechnology.

[6]  S Henikoff,et al.  Performance evaluation of amino acid substitution matrices , 1993, Proteins.

[7]  M. Sternberg,et al.  Protein structure prediction on the Web: a case study using the Phyre server , 2009, Nature Protocols.

[8]  M. Sippl,et al.  ProSup: a refined tool for protein structure alignment. , 2000, Protein engineering.

[9]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[10]  Jian Peng,et al.  Boosting Protein Threading Accuracy , 2009, RECOMB.

[11]  Adam Zemla,et al.  Critical assessment of methods of protein structure prediction (CASP)‐round V , 2005, Proteins.

[12]  SödingJohannes Protein homology detection by HMM--HMM comparison , 2005 .

[13]  M. Sippl,et al.  Structure-derived substitution matrices for alignment of distantly related sequences. , 2000, Protein engineering.

[14]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..

[15]  Oliver F. Lange,et al.  Structure prediction for CASP8 with all‐atom refinement using Rosetta , 2009, Proteins.

[16]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[17]  Sitao Wu,et al.  MUSTER: Improving protein sequence profile–profile alignments by using multiple sources of structure information , 2008, Proteins.

[18]  Hongyi Zhou,et al.  Single‐body residue‐level knowledge‐based energy score combined with sequence‐profile and secondary structure information for fold recognition , 2004, Proteins.

[19]  Ron Elber,et al.  SSALN: An alignment algorithm using structure‐dependent substitution matrices and gap penalties learned from structurally aligned protein pairs , 2005, Proteins.

[20]  Jeffrey Skolnick,et al.  Performance of the Pro‐sp3‐TASSER server in CASP8 , 2009, Proteins.

[21]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[22]  Hongyi Zhou,et al.  An accurate, residue‐level, pair potential of mean force for folding and binding based on the distance‐scaled, ideal‐gas reference state , 2004, Protein science : a publication of the Protein Society.

[23]  Jan Kosinski,et al.  Theoretical model of restriction endonuclease HpaI in complex with DNA, predicted by fold recognition and validated by site‐directed mutagenesis , 2006, Proteins.

[24]  Wei Zhang,et al.  SP5: Improving Protein Fold Recognition by Using Torsion Angle Profiles and Profile-Based Gap Penalty Model , 2008, PloS one.

[25]  J. Skolnick,et al.  Erratum: Scoring function for automated assessment of protein structure template quality (Proteins: Structure, Function and Genetics (2004) 57, (702-710)) , 2007 .

[26]  Yen Hock Tan,et al.  Statistical potential‐based amino acid similarity matrices for aligning distantly related protein sequences , 2006, Proteins.

[27]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction—Round VII , 2007, Proteins.

[28]  A. Sali,et al.  Alignment of protein sequences by their profiles , 2004, Protein science : a publication of the Protein Society.

[29]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Andrej ⩽ali,et al.  Comparative protein modeling by satisfaction of spatial restraints , 1995 .

[31]  Yang Zhang,et al.  The protein structure prediction problem could be solved using the current PDB library. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[32]  A. Sali,et al.  Protein structure modeling for structural genomics , 2000, Nature Structural Biology.

[33]  S. Henikoff,et al.  Amino acid substitution matrices. , 2000, Advances in protein chemistry.

[34]  Marc A. Martí-Renom,et al.  MODBASE: a database of annotated comparative protein structure models and associated resources , 2005, Nucleic Acids Res..

[35]  Yang Zhang,et al.  I‐TASSER: Fully automated protein structure prediction in CASP8 , 2009, Proteins.

[36]  Fourie Joubert,et al.  Novel properties of malarial S-adenosylmethionine decarboxylase as revealed by structural modelling. , 2006, Journal of molecular graphics & modelling.

[37]  Pavel Majer,et al.  Homology modeling and SAR analysis of Schistosoma japonicum cathepsin D (SjCD) with statin inhibitors identify a unique active site steric barrier with potential for the design of specific inhibitors , 2005, Biological chemistry.

[38]  Serafim Batzoglou Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology , 2009 .

[39]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[40]  A Sali,et al.  Comparative protein modeling by satisfaction of spatial restraints. , 1996, Molecular medicine today.

[41]  Robert D. Finn,et al.  Pfam 10 years on: 10 000 families and still growing , 2008, Briefings Bioinform..

[42]  Jinbo Xu,et al.  Template‐based and free modeling by RAPTOR++ in CASP8 , 2009, Proteins.

[43]  A. Giuliani,et al.  A computational approach identifies two regions of Hepatitis C Virus E1 protein as interacting domains involved in viral fusion process , 2009, BMC Structural Biology.

[44]  Hongyi Zhou,et al.  Fold recognition by combining sequence profiles derived from evolution and from depth‐dependent structural alignment of fragments , 2004, Proteins.

[45]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[46]  N. Grishin,et al.  Practical lessons from protein structure prediction , 2005, Nucleic acids research.

[47]  James A. Casbon,et al.  Analysis of superfamily specific profile-profile recognition accuracy , 2004, BMC Bioinformatics.

[48]  John Orban,et al.  The design and characterization of two proteins with 88% sequence identity but different structure and function , 2007, Proceedings of the National Academy of Sciences.

[49]  Huan-Xiang Zhou,et al.  Nonadditive effects of mixed crowding on protein stability , 2009, Proteins.

[50]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[51]  Nick V. Grishin,et al.  Quality of alignment comparison by COMPASS improves with inclusion of diverse confident homologs , 2004, Bioinform..

[52]  Andrej Sali,et al.  Fold assessment for comparative protein structure modeling , 2007, Protein science : a publication of the Protein Society.

[53]  J. Åqvist,et al.  Computational prediction of structure, substrate binding mode, mechanism, and rate for a malaria protease with a novel type of active site. , 2004, Biochemistry.

[54]  Jianlin Cheng A multi-template combination algorithm for protein comparative modeling , 2008, BMC Structural Biology.