Maximum Contact Map Overlap Revisited

Among the measures for quantifying the similarity between three-dimensional (3D) protein structures, maximum contact map overlap (CMO) received sustained attention during the past decade. Despite this, the known algorithms exhibit modest performance and are not applicable for large-scale comparison. This article offers a clear advance in this respect. We present a new integer programming model for CMO and propose an exact branch-and-bound algorithm with bounds obtained by a novel Lagrangian relaxation. The efficiency of the approach is demonstrated on a popular small benchmark (Skolnick set, 40 domains). On this set, our algorithm significantly outperforms the best existing exact algorithms. Many hard CMO instances have been solved for the first time. To further assess our approach, we constructed a large-scale set of 300 protein domains. Computing the similarity measure for any of the 44850 pairs, we obtained a classification in excellent agreement with SCOP. Supplementary Material is available at www.liebertonline.com/cmb.

[1]  Nabil H. Mustafa,et al.  Fast Molecular Shape Matching Using Contact Maps , 2007, J. Comput. Biol..

[2]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[3]  Alberto Caprara,et al.  Structural alignment of large—size proteins via lagrangian relaxation , 2002, RECOMB '02.

[4]  R. Carr,et al.  Branch-and-Cut Algorithms for Independent Set Problems: Integrality Gap and An Application to Protein Structure Alignment , 2000 .

[5]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[6]  Philip Wolfe,et al.  Validation of subgradient optimization , 1974, Math. Program..

[7]  Rumen Andonov,et al.  An Efficient Lagrangian Relaxation for the Contact Map Overlap Problem , 2008, WABI.

[8]  Lazaros Mavridis,et al.  Fast protein structure alignment using Gaussian overlap scoring of backbone peptide fragment similarity , 2012, Bioinform..

[9]  Rumen Andonov,et al.  Protein Threading: From Mathematical Models to Parallel Implementations , 2004, INFORMS J. Comput..

[10]  Adam Godzik,et al.  Flexible algorithm for direct multiple alignment of protein structures and sequences , 1994, Comput. Appl. Biosci..

[11]  Rumen Andonov,et al.  CSA: comprehensive comparison of pairwise protein structure alignments , 2012, Nucleic Acids Res..

[12]  A. Godzik The structural alignment between two proteins: Is there a unique answer? , 1996, Protein science : a publication of the Protein Society.

[13]  Zhiping Weng,et al.  FAST: A novel protein structure alignment algorithm , 2004, Proteins.

[14]  Wei Xie,et al.  A Reduction-Based Exact Algorithm for the Contact Map Overlap Problem , 2007, J. Comput. Biol..

[15]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[16]  J. Marcos Moreno-Vega,et al.  A simple and fast heuristic for protein structure comparison , 2008, BMC Bioinformatics.

[17]  J. J. McGregor,et al.  Backtrack search algorithms and the maximal common subgraph problem , 1982, Softw. Pract. Exp..

[18]  Michael Lappe,et al.  Joining Softassign and Dynamic Programming for the Contact Map Overlap Problem , 2007, BIRD.

[19]  Rumen Andonov,et al.  Algorithm engineering for optimal alignment of protein structure distance matrices , 2011, Optim. Lett..

[20]  Robert D. Carr,et al.  1001 Optimal PDB Structure Alignments: Integer Programming Methods for Finding the Maximum Contact Map Overlap , 2004, J. Comput. Biol..

[21]  Rumen Andonov,et al.  Lagrangian approaches for a class of matching problems in computational biology , 2008, Comput. Math. Appl..

[22]  Christos H. Papadimitriou,et al.  Algorithmic aspects of protein structure similarity , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[23]  Giuseppe Lancia,et al.  Protein Structure Comparison: Algorithms and Applications , 2003, Mathematical Methods for Protein Structure Analysis and Design.

[24]  Joel Sokol,et al.  Optimal Protein Structure Alignment Using Maximum Cliques , 2005, Oper. Res..

[25]  I C Lerman,et al.  Likelihood linkage analysis (LLA) classification method: an example treated by hand. , 1993, Biochimie.

[26]  Robert D. Carr,et al.  Compact optimization can outperform separation: A case study in structural proteomics , 2004, 4OR.

[27]  Frédéric Cazals,et al.  Assessing the reconstruction of macromolecular assemblies with toleranced models , 2012, Proteins.