Fold recognition by scoring protein maps using the congruence coefficient

MOTIVATION Protein fold recognition is a key step for template-based modeling approaches to protein structure prediction. Although closely related folds can be easily identified by sequence homology search in sequence databases, fold recognition is notoriously more difficult when it involves the identification of distantly related homologues. Recent progress in residue-residue contact and distance prediction opens up the possibility of improving fold recognition by using structural information contained in predicted distance and contact maps. RESULTS Here we propose to use the congruence coefficient as a metric of similarity between maps. We prove that this metric has several interesting mathematical properties which allow one to compute in polynomial time its exact mean and variance over all possible (exponentially many) alignments between two symmetric matrices, and assess the statistical significance of similarity between aligned maps. We perform fold recognition tests by recovering predicted target contact/distance maps from the two most recent CASP editions and over 27,000 non-homologous structural templates from the ECOD database. On this large benchmark, we compare fold recognition performances of different alignment tools with their own similarity scores against those obtained using the congruence coefficient. We show that the congruence coefficient overall improves fold recognition over other methods, proving its effectiveness as a general similarity metric for protein map comparison. AVAILABILITY The congruence coefficient software CCpro is available as part of the SCRATCH suite at: http://scratch.proteomics.ics.uci.edu/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  David T Jones,et al.  Recent developments in deep learning applied to protein structure prediction , 2019, Proteins.

[2]  Brian Kuhlman,et al.  Advances in protein structure prediction and design , 2019, Nature Reviews Molecular Cell Biology.

[3]  S. Li Concise Formulas for the Area and Volume of a Hyperspherical Cap , 2011 .

[4]  Rojan Shrestha,et al.  Assessing the accuracy of contact predictions in CASP13 , 2019, Proteins.

[5]  S. Wodak,et al.  Protein structure prediction by threading methods: Evaluation of current techniques , 1995, Proteins.

[6]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[7]  Renzhi Cao,et al.  Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13 , 2019, Proteins.

[8]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[9]  Yang Zhang,et al.  Deep‐learning contact‐map guided protein structure prediction in CASP13 , 2019, Proteins.

[10]  Janet M. Thornton,et al.  Protein fold recognition , 1993, J. Comput. Aided Mol. Des..

[11]  J. Gough,et al.  The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures , 2019, Nucleic Acids Res..

[12]  Yuxing Liao,et al.  ECOD: An Evolutionary Classification of Protein Domains , 2014, PLoS Comput. Biol..

[13]  C. Burt FACTOR ANALYSIS AND CANONICAL CORRELATIONS , 1948 .

[14]  Johannes Söding,et al.  HH-suite3 for fast remote homology detection and deep protein annotation , 2019, BMC Bioinformatics.

[15]  Georgios A. Pavlopoulos,et al.  Protein structure determination using metagenome sequence data , 2017, Science.

[16]  R. Sabatier,et al.  Refined approximations to permutation tests for multivariate inference , 1995 .

[17]  A. Valencia,et al.  Emerging methods in protein co-evolution , 2013, Nature Reviews Genetics.

[18]  Piero Fariselli,et al.  Fast overlapping of protein contact maps by alignment of eigenvectors , 2010, Bioinform..

[19]  Jinbo Xu,et al.  Analysis of distance‐based protein structure prediction by deep learning in CASP13 , 2019, Proteins.

[20]  Pushmeet Kohli,et al.  Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13) , 2019, Proteins.

[21]  Torsten Schwede,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round XIII , 2019, Proteins.

[22]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[23]  Daniel W. A. Buchan,et al.  EigenTHREADER: analogous protein fold recognition by efficient contact map threading , 2017, Bioinform..

[24]  Pierre Baldi,et al.  Deep architectures for protein contact map prediction , 2012, Bioinform..

[25]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..