Towards optimal alignment of protein structure distance matrices

MOTIVATION Structural alignments of proteins are important for identification of structural similarities, homology detection and functional annotation. The structural alignment problem is well studied and computationally difficult. Many different scoring schemes for structural similarity as well as many algorithms for finding high-scoring alignments have been proposed. Algorithms using contact map overlap (CMO) as scoring function are currently the only practical algorithms able to compute provably optimal alignments. RESULTS We propose a new mathematical model for the alignment of inter-residue distance matrices, building upon previous work on maximum CMO. Our model includes all elements needed to emulate various scoring schemes for the alignment of protein distance matrices. The algorithm that we use to compute alignments is practical only for sparse distance matrices. Therefore, we propose a more effective scoring function, which uses a distance threshold and only positive structural scores. We show that even under these restrictions our approach is in terms of alignment accuracy competitive with state-of-the-art structural alignment algorithms, whereas it additionally either proves the optimality of an alignment or returns bounds on the optimal score. Our novel method is freely available and constitutes an important promising step towards truly provably optimal structural alignments of proteins. AVAILABILITY An executable of our program PAUL is available at http://planet-lisa.net/.

[1]  J. Jung,et al.  Protein structure alignment using environmental profiles. , 2000, Protein engineering.

[2]  Peter Lackner,et al.  Comparative Analysis of Protein Structure Alignments , 2007, BMC Structural Biology.

[3]  Robert D. Carr,et al.  1001 Optimal PDB Structure Alignments: Integer Programming Methods for Finding the Maximum Contact Map Overlap , 2004, J. Comput. Biol..

[4]  Takeshi Kawabata,et al.  MATRAS: a program for protein 3D structure comparison , 2003, Nucleic Acids Res..

[5]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[6]  Richard C. Wilson,et al.  Flexible structural protein alignment by a sequence of local transformations , 2009, Bioinform..

[7]  Aleksandar Poleksic,et al.  Algorithms for optimal protein structure alignment , 2009, Bioinform..

[8]  Andreas Prlic,et al.  SISYPHUS—structural alignments for proteins with non-trivial relationships , 2006, Nucleic Acids Res..

[9]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[10]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[11]  Knut Reinert,et al.  Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization , 2007, BMC Bioinformatics.

[12]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[13]  D Fischer,et al.  A computer vision based technique for 3-D sequence-independent structural comparison of proteins. , 1993, Protein engineering.

[14]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[15]  Peter Lackner,et al.  Accuracy analysis of multiple structure alignments , 2009, Protein science : a publication of the Protein Society.

[16]  John D. Kececioglu,et al.  The Maximum Weight Trace Problem in Multiple Sequence Alignment , 1993, CPM.

[17]  Gerard J Kleywegt,et al.  Déjà vu all over again: finding and analyzing protein structure similarities. , 2004, Structure.

[18]  J. Marcos Moreno-Vega,et al.  A simple and fast heuristic for protein structure comparison , 2008, BMC Bioinformatics.

[19]  Timothy F. Havel,et al.  The theory and practice of distance geometry , 1983, Bulletin of Mathematical Biology.

[20]  R. Lathrop The protein threading problem with sequence amino acid interaction preferences is NP-complete. , 1994, Protein engineering.

[21]  References , 1971 .

[22]  A. Edwards,et al.  Structural proteomics: a tool for genome annotation. , 2004, Current opinion in chemical biology.

[23]  J. Rossjohn,et al.  Molecular basis of glutathione synthetase deficiency and a rare gene permutation event , 1999, The EMBO journal.

[24]  Wei Xie,et al.  A Reduction-Based Exact Algorithm for the Contact Map Overlap Problem , 2007, J. Comput. Biol..

[25]  Joel Sokol,et al.  Optimal Protein Structure Alignment Using Maximum Cliques , 2005, Oper. Res..

[26]  M. Levitt,et al.  Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core , 1993, Current Biology.

[27]  Rumen Andonov,et al.  Maximum Cliques in Protein Structure Comparison , 2009, SEA.

[28]  Klaus Obermayer,et al.  Bimal: Bipartite matching alignment for the contact map overlap problem , 2009, 2009 International Joint Conference on Neural Networks.

[29]  Gunnar W. Klau,et al.  Aligning Protein Structures Using Distance Matrices and Combinatorial Optimization , 2009, GCB.

[30]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[31]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[32]  Ralf Zimmer,et al.  Vorolign - fast structural alignment using Voronoi contacts , 2007, Bioinform..

[33]  Lenore Cowen,et al.  Matt: Local Flexibility Aids Protein Multiple Structure Alignment , 2008, PLoS Comput. Biol..

[34]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[35]  Ralf Zimmer,et al.  Protein structure alignment considering phenotypic plasticity , 2008, ECCB.

[36]  Haruki Nakamura,et al.  ASH structure alignment package: Sensitivity and selectivity in domain classification , 2007, BMC Bioinformatics.

[37]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[38]  H. Kato,et al.  A pseudo-michaelis quaternary complex in the reverse reaction of a ligase: structure of Escherichia coli B glutathione synthetase complexed with ADP, glutathione, and sulfate at 2.0 A resolution. , 1996, Biochemistry.

[39]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[40]  Nathan Linial,et al.  Approximate protein structural alignment in polynomial time. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Rumen Andonov,et al.  An Efficient Lagrangian Relaxation for the Contact Map Overlap Problem , 2008, WABI.

[42]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.