Improving Protein Docking with Constraint Programming and Coevolution Data

Background Constraint programming (CP) is usually seen as a rigid approach, focusing on crisp, precise, distinctions between what is allowed as a solution and what is not. At first sight, this makes it seem inadequate for bioinformatics applications that rely mostly on statistical parameters and optimization. The prediction of protein interactions, or protein docking, is one such application. And this apparent problem with CP is particularly evident when constraints are provided by noisy data, as it is the case when using the statistical analysis of Multiple Sequence Alignments (MSA) to extract coevolution information. The goal of this paper is to show that this first impression is misleading and that CP is a useful technique for improving protein docking even with data as vague and noisy as the coevolution indicators that can be inferred from MSA. Results Here we focus on the study of two protein complexes. In one case we used a simplified estimator of interaction propensity to infer a set of five candidate residues for the interface and used that set to constrain the docking models. Even with this simplified approach and considering only the interface of one of the partners, there is a visible focusing of the models around the correct configuration. Considering a set of 400 models with the best geometric contacts, this constraint increases the number of models close to the target (RMSD ¡5Å) from 2 to 5 and decreases the RMSD of all retained models from 26Å to 17.5Å. For the other example we used a more standard estimate of coevolving residues, from the Co-Evolution Analysis using Protein Sequences (CAPS) software. Using a group of three residues identified from the sequence alignment as potentially co-evolving to constrain the search, the number of complexes similar to the target among the 50 highest scoring docking models increased from 3 in the unconstrained docking to 30 in the constrained docking. Conclusions Although only a proof-of-concept application, our results show that, with suitably designed constraints, CP allows us to integrate coevolution data, which can be inferred from databases of protein sequences, even though the data is noisy and often “fuzzy”, with no well-defined discontinuities. This also shows, more generally, that CP in bioinformatics needs not be limited to the more crisp cases of finite domains and explicit rules but can also be applied to a broader range of problems that depend on statistical measurements and continuous data.

[1]  Roland H. C. Yap Parametric Sequence Alignment with Constraints , 2004, Constraints.

[2]  N. Ben-Tal,et al.  Residue frequencies and pairing preferences at protein–protein interfaces , 2001, Proteins.

[3]  H. Wolfson,et al.  Correlated mutations: Advances and limitations. A study on fusion proteins and on the Cohesin‐Dockerin families , 2006, Proteins.

[4]  Mario A. Fares,et al.  CAPS: coevolution analysis using protein sequences , 2006, Bioinform..

[5]  Raphaël Guerois,et al.  Coevolution at protein complex interfaces can be detected by the complementarity trace with important impact for predictive docking , 2008, Proceedings of the National Academy of Sciences.

[6]  Ludwig Krippahl,et al.  Synechocystis ferredoxin/ferredoxin‐NADP+‐reductase/NADP+ complex: Structural model obtained by NMR‐restrained docking , 2005, FEBS letters.

[7]  Joël Janin,et al.  The third CAPRI assessment meeting Toronto, Canada, April 20-21, 2007. , 2007, Structure.

[8]  Winfried Just,et al.  Computational Complexity of Multiple Sequence Alignment with SP-Score , 2001, J. Comput. Biol..

[9]  Jens Stoye,et al.  Divide-and-conquer multiple alignment with segment-based constraints , 2003, ECCB.

[10]  C. Dominguez,et al.  HADDOCK: a protein-protein docking approach based on biochemical or biophysical information. , 2003, Journal of the American Chemical Society.

[11]  Z. Weng,et al.  Protein–protein docking benchmark version 3.0 , 2008, Proteins.

[12]  Mario A. Fares,et al.  Why Should We Care About Molecular Coevolution? , 2008, Evolutionary bioinformatics online.

[13]  Peter van Beek,et al.  Principles and Practice of Constraint Programming - CP 2005, 11th International Conference, CP 2005, Sitges, Spain, October 1-5, 2005, Proceedings , 2005, CP.

[14]  Ruth Nussinov,et al.  Principles of docking: An overview of search algorithms and a guide to scoring functions , 2002, Proteins.

[15]  Alfonso Valencia,et al.  Protein co-evolution, co-adaptation and interactions , 2008, The EMBO journal.

[16]  Sandor Vajda,et al.  CAPRI: A Critical Assessment of PRedicted Interactions , 2003, Proteins.

[17]  Donald Evans Why should we care , 1990 .

[18]  Simon A. A. Travers,et al.  A Novel Method for Detecting Intramolecular Coevolution: Adding a Further Dimension to Selective Constraints Analyses , 2006, Genetics.

[19]  Rolf Backofen,et al.  Efficient Sequence Alignment with Side-Constraints by Cluster Tree Elimination , 2008, Constraints.

[20]  M. Graille,et al.  Molecular basis for bacterial class I release factor methylation by PrmC. , 2005, Molecular cell.

[21]  Ludwig Krippahl,et al.  Modeling protein complexes with BiGGER , 2003, Proteins.

[22]  L. Krippahl,et al.  Modulation of the proteolytic activity of matrix metalloproteinase-2 (gelatinase A) on fibrinogen. , 2007, The Biochemical journal.

[23]  Daniel Y. Little,et al.  Identification of Coevolving Residues and Coevolution Potentials Emphasizing Structure, Bond Formation and Catalytic Coordination in Protein Evolution , 2009, PloS one.

[24]  Isaac Elias,et al.  Settling the Intractability of Multiple Alignment , 2003, ISAAC.

[25]  Eugene W. Myers,et al.  Progressive multiple alignment with constraints , 1997, RECOMB '97.

[26]  L. Krippahl,et al.  BiGGER: A new (soft) docking algorithm for predicting protein interactions , 2000, Proteins.

[27]  Richa Agarwala,et al.  COBALT: constraint-based alignment tool for multiple protein sequences , 2007, Bioinform..

[28]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[29]  R. Glockshuber,et al.  Structural basis and kinetics of DsbD-dependent cytochrome c maturation. , 2005, Structure.

[30]  Zhiping Weng,et al.  Protein–protein docking benchmark version 4.0 , 2010, Proteins.

[31]  David Haussler,et al.  Detecting Coevolution in and among Protein Domains , 2007, PLoS Comput. Biol..

[32]  Graziano Pesole,et al.  Correlated substitution analysis and the prediction of amino acid structural contacts , 2007, Briefings Bioinform..

[33]  John Haldane Why Should We Care? , 1992 .

[34]  David L. Robertson,et al.  An integrated view of molecular coevolution in protein-protein interactions. , 2010, Molecular biology and evolution.

[35]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.

[36]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..

[37]  Pedro Barahona,et al.  Applying Constraint Programming to Rigid Body Protein Docking , 2005, CP.