Predicting accurate contacts in thousands of Pfam domain families using PconsC3

Motivation: A few years ago it was shown that by using a maximum entropy approach to describe couplings between columns in a multiple sequence alignment it is possible to significantly increase the accuracy of residue contact predictions. For very large protein families with more than 1000 effective sequences the accuracy is sufficient to produce accurate models of proteins as well as complexes. Today, for about half of all Pfam domain families no structure is known, but unfortunately most of these families have at most a few hundred members, i.e. are too small for such contact prediction methods. Results: To extend accurate contact predictions to the thousands of smaller protein families we present PconsC3, a fast and improved method for protein contact predictions that can be used for families with even 100 effective sequence members. PconsC3 outperforms direct coupling analysis (DCA) methods significantly independent on family size, secondary structure content, contact range, or the number of selected contacts. Availability and implementation: PconsC3 is available as a web server and downloadable version at http://c3.pcons.net. The downloadable version is free for all to use and licensed under the GNU General Public License, version 2. At this site contact predictions for most Pfam families are also available. We do estimate that more than 4000 contact maps for Pfam families of unknown structure have more than 50% of the top‐ranked contacts predicted correctly. Contact: arne@bioinfo.se Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[2]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[3]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[4]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[5]  T. Petersen,et al.  A generic method for assignment of reliability scores applied to solvent accessibility predictions , 2009, BMC Structural Biology.

[6]  Erik van Nimwegen,et al.  Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments , 2010, PLoS Comput. Biol..

[7]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[8]  Magnus Ekeberg,et al.  Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences , 2014, J. Comput. Phys..

[9]  Thomas A. Hopf,et al.  Sequence co-evolution gives 3D contacts and structures of protein complexes , 2014, eLife.

[10]  Christodoulos A Floudas,et al.  Alpha-helical topology prediction and generation of distance restraints in membrane proteins. , 2008, Biophysical journal.

[11]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[12]  C. Sander,et al.  All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences , 2015, Proceedings of the National Academy of Sciences.

[13]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[14]  C. Sander,et al.  Correlated Mutations and Residue Contacts , 1994 .

[15]  Carlo Baldassi,et al.  Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners , 2014, PloS one.

[16]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[17]  Marcin J. Skwark,et al.  Improved Contact Predictions Using the Recognition of Protein Like Contact Patterns , 2014, PLoS Comput. Biol..

[18]  Zhiyong Wang,et al.  Predicting protein contact map using evolutionary and physical constraints by integer programming , 2013, Bioinform..

[19]  Burkhard Rost,et al.  FreeContact: fast and free software for protein contact prediction from residue co-evolution , 2014, BMC Bioinformatics.

[20]  Marcin J. Skwark,et al.  PconsFold: improved contact predictions improve protein models , 2014, Bioinform..

[21]  Jens Meiler,et al.  CASP6 assessment of contact prediction , 2005, Proteins.

[22]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[23]  S. Eddy,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[24]  Jianlin Cheng,et al.  CONFOLD: Residue‐residue contact‐guided ab initio protein folding , 2015, Proteins.

[25]  Markus Gruber,et al.  CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations , 2014, Bioinform..

[26]  Pierre Baldi,et al.  Improved residue contact prediction using support vector machines and a large feature set , 2007, BMC Bioinformatics.

[27]  Pierre Baldi,et al.  Deep architectures for protein contact map prediction , 2012, Bioinform..

[28]  E. Aurell,et al.  Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[30]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[31]  Erik Aurell,et al.  The Maximum Entropy Fallacy Redux? , 2016, PLoS Comput. Biol..

[32]  Timothy Nugent,et al.  Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis , 2012, Proceedings of the National Academy of Sciences.

[33]  R Dustin Schaeffer,et al.  Manual classification strategies in the ECOD database , 2015, Proteins.

[34]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[35]  Jinbo Xu,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016 .

[36]  Thomas A. Hopf,et al.  Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[37]  Thomas A. Hopf,et al.  Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing , 2012, Cell.