Bio‐knowledge‐based filters improve residue‐residue contact prediction accuracy

Motivation: Residue‐residue contact prediction through direct coupling analysis has reached impressive accuracy, but yet higher accuracy will be needed to allow for routine modelling of protein structures. One way to improve the prediction accuracy is to filter predicted contacts using knowledge about the particular protein of interest or knowledge about protein structures in general. Results: We focus on the latter and discuss a set of filters that can be used to remove false positive contact predictions. Each filter depends on one or a few cut‐off parameters for which the filter performance was investigated. Combining all filters while using default parameters resulted for a test set of 851 protein domains in the removal of 29% of the predictions of which 92% were indeed false positives. Availability and implementation: All data and scripts are available at http://comprec‐lin.iiar.pwr.edu.pl/FPfilter/. Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Shuai Cheng Li,et al.  Improving residue-residue contact prediction via low-rank and sparse decomposition of residue correlation matrix. , 2016, Biochemical and biophysical research communications.

[2]  Simona Cocco,et al.  Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction , 2015, Nucleic acids research.

[3]  Marcin J. Skwark,et al.  Improving Contact Prediction along Three Dimensions , 2014, PLoS Comput. Biol..

[4]  Thomas A. Hopf,et al.  Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing , 2012, Cell.

[5]  Malgorzata Kotulska,et al.  Forecasting residue‐residue contact prediction accuracy , 2017, Bioinform..

[6]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[7]  Thomas A. Hopf,et al.  Sequence co-evolution gives 3D contacts and structures of protein complexes , 2014, eLife.

[8]  G Vriend,et al.  The interaction of class B G protein-coupled receptors with their hormones. , 1998, Receptors & channels.

[9]  Georgios G. Gkoutos,et al.  Lipid-facing correlated mutations and dimerization in G-protein coupled receptors. , 2001, Protein engineering.

[10]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction: Progress and new directions in round XI , 2016, Proteins.

[11]  Gert Vriend,et al.  Quantitative evaluation of experimental NMR restraints. , 2003, Journal of the American Chemical Society.

[12]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[13]  Zhao Li,et al.  Identification of Protein-Protein Interactions by Detecting Correlated Mutation at the Interface , 2015, J. Chem. Inf. Model..

[14]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[15]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[16]  W. Taylor,et al.  Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. , 1997, Protein engineering.

[17]  P Fariselli,et al.  Prediction of contact maps with neural networks and correlated mutations. , 2001, Protein engineering.

[18]  C. Sander,et al.  Determinants of protein function revealed by combinatorial entropy optimization , 2007, Genome Biology.

[19]  Yizhou Yu,et al.  Folding membrane proteins by deep transfer learning , 2017, bioRxiv.

[20]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[21]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[22]  K Nishikawa,et al.  A geometrical constraint approach for reproducing the native backbone conformation of a protein , 1993, Proteins.

[23]  Marcin J. Skwark,et al.  PconsC: combination of direct information methods and alignments improves contact prediction , 2013, Bioinform..

[24]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[25]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[26]  John Moult Protein structure prediction , 2000 .

[27]  E. Aurell,et al.  Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  K. Henrick,et al.  Inference of macromolecular assemblies from crystalline state. , 2007, Journal of molecular biology.

[29]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[30]  Laerte Oliveira,et al.  Identification of functionally conserved residues with the use of entropy–variability plots , 2003, Proteins.

[31]  Cristina Marino Buslje,et al.  I-COMS: Interprotein-COrrelated Mutations Server , 2015, Nucleic Acids Res..

[32]  K. Nagai,et al.  Coordinated amino acid changes in homologous protein families. , 1988, Protein engineering.

[33]  A. Tramontano,et al.  New encouraging developments in contact prediction: Assessment of the CASP11 results , 2016, Proteins.

[34]  Thomas A. Hopf,et al.  Mutation effects predicted from sequence co-variation , 2017, Nature Biotechnology.

[35]  E. Neher How frequent are correlated changes in families of protein sequences? , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[36]  S Brunak,et al.  Protein structures from distance inequalities. , 1993, Journal of molecular biology.

[37]  J. Skolnick,et al.  MONSSTER: a method for folding globular proteins with a small number of distance restraints. , 1997, Journal of molecular biology.

[38]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[39]  David T. Jones,et al.  De Novo Structure Prediction of Globular Proteins Aided by Sequence Variation-Derived Contacts , 2014, PloS one.

[40]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[41]  Malgorzata Kotulska,et al.  Automated Procedure for Contact-Map-Based Protein Structure Reconstruction , 2014, The Journal of Membrane Biology.

[42]  Malgorzata Kotulska,et al.  Correlated mutations select misfolded from properly folded proteins , 2017, Bioinform..

[43]  C. Sander,et al.  Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? , 1994, Protein engineering.

[44]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[45]  Gert Vriend,et al.  Traditional Biomolecular Structure Determination by NMR Spectroscopy Allows for Major Errors , 2005, PLoS Comput. Biol..

[46]  P. Barth,et al.  Evolutionary-guided de novo structure prediction of self-associated transmembrane helical proteins with near-atomic accuracy , 2015, Nature Communications.

[47]  Pierre Baldi,et al.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity , 2014, Bioinform..

[48]  Gert Vriend,et al.  New ways to boost molecular dynamics simulations , 2015, J. Comput. Chem..

[49]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[50]  Simona Cocco,et al.  From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction , 2012, PLoS Comput. Biol..

[51]  M Vendruscolo,et al.  Recovery of protein structure from contact maps. , 1997, Folding & design.

[52]  Georgios A. Pavlopoulos,et al.  Protein structure determination using metagenome sequence data , 2017, Science.