Refined template selection and combination algorithm significantly improves template-based modeling accuracy

In contrast to ab-initio protein modeling methodologies, comparative modeling is considered as the most popular and reliable algorithm to model protein structure. However, the selection of the best set of templates is still a major challenge. An effective template-ranking algorithm is developed to efficiently select only the reliable hits for predicting the protein structures. The algorithm employs the pairwise as well as multiple sequence alignments of template hits to rank and select the best possible set of templates. It captures several key sequences and structural information of template hits and converts into scores to effectively rank them. This selected set of templates is used to model a target. Modeling accuracy of the algorithm is tested and evaluated on TBM-HA domain containing CASP8, CASP9 and CASP10 targets. On an average, this template ranking and selection algorithm improves GDT-TS, GDT-HA and TM_Score by 3.531, 4.814 and 0.022, respectively. Further, it has been shown that the inclusion of structurally similar templates with ample conformational diversity is crucial for the modeling algorithm to maximally as well as reliably span the target sequence and construct its near-native model. The optimal model sampling also holds the key to predict the best possible target structure.

[1]  Badri Adhikari,et al.  Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning , 2018, Proteins.

[2]  Z. Luthey-Schulten,et al.  Ab initio protein structure prediction. , 2002, Current opinion in structural biology.

[3]  Yang Zhang,et al.  TASSER: An automated method for the prediction of protein tertiary structures in CASP6 , 2005, Proteins.

[4]  Marco Biasini,et al.  SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information , 2014, Nucleic Acids Res..

[5]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[6]  A. Sali 100,000 protein structures for the biologist , 1998, Nature Structural Biology.

[7]  Jianlin Cheng,et al.  MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8 , 2010, Bioinform..

[8]  Jian Peng,et al.  Low-homology protein threading , 2010, Bioinform..

[9]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[10]  Ashish Runthala Protein structure prediction: challenging targets for CASP10 , 2012, Journal of biomolecular structure & dynamics.

[11]  Torsten Schwede,et al.  The SWISS-MODEL Repository and associated resources , 2008, Nucleic Acids Res..

[12]  A. Sali,et al.  Protein structure modeling for structural genomics , 2000, Nature Structural Biology.

[13]  D Thirumalai,et al.  Development of novel statistical potentials for protein fold recognition. , 2004, Current opinion in structural biology.

[14]  Jinbo Xu,et al.  A multiple‐template approach to protein threading , 2011, Proteins.

[15]  Anna Tramontano,et al.  Assessment of homology‐based predictions in CASP5 , 2003, Proteins.

[16]  Chris Sander,et al.  Completeness in structural genomics , 2001, Nature Structural Biology.

[17]  Leszek Rychlewski,et al.  Fold prediction by a hierarchy of sequence, threading, and modeling methods , 1998, Protein science : a publication of the Protein Society.

[18]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[19]  Richard Wolfenden,et al.  Comparing the polarities of the amino acids: side-chain distribution coefficients between the vapor phase, cyclohexane, 1-octanol, and neutral aqueous solution , 1988 .

[20]  Jianlin Cheng,et al.  APOLLO: a quality assessment service for single and multiple protein models , 2011, Bioinform..

[21]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[22]  Johannes Söding,et al.  Fast and accurate automatic structure prediction with HHpred , 2009, Proteins.

[23]  Benjamin R. Jefferys,et al.  Protein Folding Requires Crowd Control in a Simulated Cell , 2010, Journal of molecular biology.

[24]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[25]  Sheng Wang,et al.  Protein threading using residue co-variation and deep learning , 2018, Bioinform..

[26]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[27]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[28]  D T Jones,et al.  Benchmarking template selection and model quality assessment for high‐resolution comparative modeling , 2007, Proteins.

[29]  Jaime Prilusky,et al.  Assessment of CASP8 structure predictions for template free targets , 2009, Proteins.

[30]  Sean R. Eddy,et al.  Pfam: multiple sequence alignments and HMM-profiles of protein domains , 1998, Nucleic Acids Res..

[31]  A. Sali,et al.  Comparative protein structure modeling by iterative alignment, model building and model assessment. , 2003, Nucleic acids research.

[32]  Ceslovas Venclovas,et al.  Progress over the first decade of CASP experiments , 2005, Proteins.

[33]  Jianlin Cheng A multi-template combination algorithm for protein comparative modeling , 2008, BMC Structural Biology.

[34]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[35]  Sitao Wu,et al.  MUSTER: Improving protein sequence profile–profile alignments by using multiple sources of structure information , 2008, Proteins.

[36]  Andriy Kryshtafovych,et al.  Assessment of contact predictions in CASP12: Co‐evolution and deep learning coming of age , 2017, Proteins.

[37]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[38]  Ashish Runthala,et al.  Unsolved Problems of Ambient Computationally Intelligent TBM Algorithms , 2016 .

[39]  A. Sali,et al.  Alignment of protein sequences by their profiles , 2004, Protein science : a publication of the Protein Society.

[40]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Johannes Söding,et al.  Context similarity scoring improves protein sequence alignments in the midnight zone , 2015, Bioinform..

[42]  Jaap Heringa,et al.  PRALINE: a versatile multiple sequence alignment toolkit. , 2014, Methods in molecular biology.

[43]  Jeffrey Skolnick,et al.  Performance of the Pro‐sp3‐TASSER server in CASP8 , 2009, Proteins.

[44]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[45]  D T Jones,et al.  Prediction of novel and analogous folds using fragment assembly and fold recognition , 2005, Proteins.

[46]  Yang Zhang,et al.  I‐TASSER: Fully automated protein structure prediction in CASP8 , 2009, Proteins.

[47]  Jie Hou,et al.  DeepSF: deep convolutional neural network for mapping protein sequences to folds , 2017, Bioinform..

[48]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[49]  Jean-Luc Pons,et al.  @TOME-2: a new pipeline for comparative modeling of protein–ligand complexes , 2009, Nucleic Acids Res..

[50]  K. Hamacher,et al.  Three-body interactions improve contact prediction within direct-coupling analysis. , 2017, Physical review. E.

[51]  John Moult,et al.  A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. , 2005, Current opinion in structural biology.

[52]  Kevin J. Maurice,et al.  SSThread: Template‐free protein structure prediction by threading pairs of contacting secondary structures followed by assembly of overlapping pairs , 2014, J. Comput. Chem..

[53]  Alexei Finkelstein,et al.  Threading with chemostructural restrictions method for predicting fold and functionally significant residues: Application to dipeptidylpeptidase IV (DPP‐IV) , 2002, Proteins.