A low‐complexity add‐on score for protein remote homology search with COMER

Motivation: Protein sequence alignment forms the basis for comparative modeling, the most reliable approach to protein structure prediction, among many other applications. Alignment between sequence families, or profile‐profile alignment, represents one of the most, if not the most, sensitive means for homology detection but still necessitates improvement. We aim at improving the quality of profile‐profile alignments and the sensitivity induced by them by refining profile‐profile substitution scores. Results: We have developed a new score that represents an additional component of profile‐profile substitution scores. A comprehensive evaluation shows that the new add‐on score statistically significantly improves both the sensitivity and the alignment quality of the COMER method. We discuss why the score leads to the improvement and its almost optimal computational complexity that makes it easily implementable in any profile‐profile alignment method. Availability and implementation: An implementation of the add‐on score in the open‐source COMER software and data are available at https://sourceforge.net/projects/comer. The COMER software is also available on Github at https://github.com/minmarg/comer and as a Docker image (minmar/comer). Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Yuxing Liao,et al.  ECOD: An Evolutionary Classification of Protein Domains , 2014, PLoS Comput. Biol..

[2]  Liisa Holm,et al.  Searching protein structure databases with DaliLite v.3 , 2008, Bioinform..

[3]  Hong-Bin Shen,et al.  Template‐based protein structure prediction in CASP11 and retrospect of I‐TASSER in the last decade , 2016, Proteins.

[4]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[5]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[6]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[7]  Mindaugas Margelevicius,et al.  Detection of distant evolutionary relationships between protein families using theory of sequence profile-profile comparison , 2010, BMC Bioinformatics.

[8]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[9]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[10]  P. Kraulis A program to produce both detailed and schematic plots of protein structures , 1991 .

[11]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[12]  B. Klartag A central limit theorem for convex sets , 2006, math/0605014.

[13]  Zhiyong Wang,et al.  MRFalign: Protein Homology Detection through Alignment of Markov Random Fields , 2014, PLoS Comput. Biol..

[14]  Byungkook Lee,et al.  Context‐specific amino acid substitution matrices and their use in the detection of protein homologs , 2008, Proteins.

[15]  Johannes Söding,et al.  Context similarity scoring improves protein sequence alignments in the midnight zone , 2015, Bioinform..

[16]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[17]  Steven E. Brenner,et al.  SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures , 2013, Nucleic Acids Res..

[18]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[19]  Kimmen Sjölander,et al.  COACH : profile-profile alignment of protein families using hidden Markov models , 2003 .

[20]  E A Merritt,et al.  Raster3D: photorealistic molecular graphics. , 1997, Methods in enzymology.

[21]  N. Grishin,et al.  COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. , 2003, Journal of molecular biology.

[22]  Maria Jesus Martin,et al.  Uniclust databases of clustered and deeply annotated protein sequences and alignments , 2016, Nucleic Acids Res..

[23]  Christoph Weber,et al.  FFAS server: novel features and applications , 2011, Nucleic Acids Res..

[24]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Golan Yona,et al.  Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. , 2002, Journal of molecular biology.

[26]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[27]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction: Progress and new directions in round XI , 2016, Proteins.

[28]  Mindaugas Margelevicius,et al.  Bayesian nonparametrics in protein remote homology search , 2016, Bioinform..

[29]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[30]  Jianzhu Ma,et al.  Protein threading using context-specific alignment potential , 2013 .

[31]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[32]  Keehyoung Joo,et al.  Template based protein structure modeling by global optimization in CASP11 , 2016, Proteins.

[33]  A. Biegert,et al.  Sequence context-specific profiles for homology searching , 2009, Proceedings of the National Academy of Sciences.

[34]  M. Waterman,et al.  A Phase Transition for the Score in Matching Random Sequences Allowing Deletions , 1994 .

[35]  N. Grishin,et al.  PROCAIN: protein profile comparison with assisting information , 2009, Nucleic acids research.

[36]  Roland L Dunbrack,et al.  Assessment of template‐based modeling of protein structure in CASP11 , 2016, Proteins.

[37]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[38]  Jimin Pei,et al.  Using homology relations within a database markedly boosts protein sequence similarity search , 2015, Proceedings of the National Academy of Sciences.

[39]  R Dustin Schaeffer,et al.  CASP 11 target classification , 2016, Proteins.