Characteristics of protein residue-residue contacts and their application in contact prediction

Contact sites between amino acids characterize important structural features of a protein. We investigated characteristics of contact sites in a representative set of proteins and their relations between protein class or topology. For this purpose, we used a non-redundant set of 5872 protein domains, identically categorized by CATH and SCOP databases. The proteins represented alpha, beta, and alpha+beta classes. Contact maps of protein structures were obtained for a selected set of physical distances in the main backbone and separations in protein sequences. For each set a dependency between contact degree and distance parameters was quantified. We indicated residues forming contact sites most frequently and unique amino acid pairs which created contact sites most often within each structural class. Contact characteristics of specific topologies were compared to the characteristics of their protein classes showing protein groups with a distinguished contact characteristic. We showed that our results could be used to improve the performance of recent top contact predictor — direct coupling analysis. Our work provides values of contact site propensities that can be involved in bioinformatic databases.

[1]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[2]  Andreas Prlic,et al.  BioJava: an open-source framework for bioinformatics in 2012 , 2012, Bioinform..

[3]  Xiaoning Qian,et al.  Accurate prediction of protein structural classes using functional domains and predicted secondary structure sequences , 2012, Journal of biomolecular structure & dynamics.

[4]  Jiang Xie,et al.  CNNcon: Improved Protein Contact Maps Prediction Using Cascaded Neural Networks , 2013, PloS one.

[5]  Ralf Zimmer,et al.  Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis , 2009, BMC Structural Biology.

[6]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[7]  K. Nishikawa,et al.  Radial locations of amino acid residues in a globular protein: correlation with the sequence. , 1986, Journal of biochemistry.

[8]  S H Kim,et al.  Environment-dependent residue contact energies for proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[9]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[10]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[11]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[12]  O. Ptitsyn,et al.  Empirical solvent‐mediated potentials hold for both intra‐molecular and inter‐molecular inter‐residue interactions , 1998, Protein science : a publication of the Protein Society.

[13]  A. Bornot,et al.  Protein contacts, inter-residue interactions and side-chain modelling. , 2008, Biochimie.

[14]  Lukasz A. Kurgan,et al.  Secondary structure-based assignment of the protein structural classes , 2008, Amino Acids.

[15]  R. Russell,et al.  Amino‐Acid Properties and Consequences of Substitutions , 2003 .

[16]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[17]  Michael Lappe,et al.  Optimal contact definition for reconstruction of Contact Maps , 2010, BMC Bioinformatics.

[18]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[19]  Pierre Baldi,et al.  Deep architectures for protein contact map prediction , 2012, Bioinform..

[20]  Marcin J. Skwark,et al.  PconsC: combination of direct information methods and alignments improves contact prediction , 2013, Bioinform..

[21]  Kuo-Chen Chou,et al.  Recent advances in predicting protein classification and their applications to drug development. , 2013, Current topics in medicinal chemistry.

[22]  Kuo-Chen Chou,et al.  Predicting protein structural class by functional domain composition. , 2004, Biochemical and biophysical research communications.

[23]  E. Aurell,et al.  Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  D T Jones,et al.  A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. , 1999, Structure.

[25]  Jie Liang,et al.  Helix-helix packing and interfacial pairwise interactions of residues in membrane proteins. , 2001, Journal of molecular biology.

[26]  William R. Taylor,et al.  Structural Constraints on the Covariance Matrix Derived from Multiple Aligned Protein Sequences , 2011, PloS one.

[27]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[28]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[29]  Andreas Prlic,et al.  Sequence analysis , 2003 .

[30]  A. Godzik,et al.  Are proteins ideal mixtures of amino acids? Analysis of energy parameter sets , 1995, Protein science : a publication of the Protein Society.

[31]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[32]  P Argos,et al.  Prediction of secondary structural content of proteins from their amino acid composition alone. II. The paradox with secondary structural class , 1996, Proteins.

[33]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[34]  Jianlin Cheng,et al.  Predicting protein residue-residue contacts using deep networks and boosting , 2012, Bioinform..

[35]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[36]  Wentao Ma,et al.  Life: Self-Directing with Unlimited Variability on Self-Speeding , 2012, Journal of biomolecular structure & dynamics.

[37]  N. Ben-Tal,et al.  Residue frequencies and pairing preferences at protein–protein interfaces , 2001, Proteins.

[38]  Simona Cocco,et al.  From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction , 2012, PLoS Comput. Biol..