Extracting multiple structural alignments from pairwise alignments: a comparison of a rigorous and a heuristic approach

MOTIVATION Multiple structural alignments (MSTAs) provide position-specific information on the sequence variability allowed by protein folds. This information can be exploited to better understand the evolution of proteins and the physical chemistry of polypeptide folding. Most MSTA methods rely on a pre-computed library of pairwise alignments. This library will in general contain conflicting residue equivalences not all of which can be realized in the final MSTA. Hence to build a consistent MSTA, these methods have to select a conflict-free subset of equivalences. RESULTS Using a dataset with 327 families from SCOP 1.63 we compare the ability of two different methods to select an optimal conflict-free subset of equivalences. One is an implementation of Reinert et al.'s integer linear programming formulation (ILP) of the maximum weight trace problem (Reinert et al., 1997, Proc. 1st Ann. Int. Conf. Comput. Mol. Biol. (RECOMB-97), ACM Press, New York). This ILP formulation is a rigorous approach but its complexity is difficult to predict. The other method is T-Coffee (Notredame et al., 2000) which uses a heuristic enhancement of the equivalence weights which allow it to use the speed and simplicity of the progressive alignment approach while still incorporating information of all alignments in each step of building the MSTA. We find that although the ILP formulation consistently selects a more optimal set of conflict-free equivalences, the differences are small and the quality of the resulting MSTAs are essentially the same for both methods. Given its speed and predictable complexity, our results show that T-Coffee is an attractive alternative for producing high-quality MSTAs.

[1]  W R Taylor,et al.  SSAP: sequential structure alignment program for protein structure comparison. , 1996, Methods in enzymology.

[2]  M. Levitt,et al.  Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core , 1993, Current Biology.

[3]  G. Barton,et al.  Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels , 1992, Proteins.

[4]  M. Levitt,et al.  A unified statistical framework for sequence comparison and structure comparison. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Ruth Nussinov,et al.  MUSTA - A General, Efficient, Automated Method for Multiple Structure Alignment and Detection of Common Motifs: Application to Proteins , 2001, J. Comput. Biol..

[6]  María Elena Ochagavía,et al.  Progressive combinatorial algorithm for multiple structural alignments: Application to distantly related proteins , 2004, Proteins.

[7]  D. Ding,et al.  A differential geometric treatment of protein structure comparison. , 1994, Bulletin of mathematical biology.

[8]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[9]  P. Koehl,et al.  Protein structure similarities. , 2001, Current opinion in structural biology.

[10]  Ruth Nussinov,et al.  A method for simultaneous alignment of multiple protein structures , 2004, Proteins.

[11]  Ding Da-fu,et al.  A differential geometric treatment of protein structure comparison , 1994 .

[12]  Christine A. Orengo,et al.  Protein Structure Comparison , 2002 .

[13]  Michael Lappe,et al.  A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3 , 2001, Nucleic Acids Res..

[14]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[15]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[16]  T. Blundell,et al.  Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. , 1990, Journal of molecular biology.

[17]  Ruth Nussinov,et al.  MASS: multiple structural alignment by secondary structures , 2003, ISMB.

[18]  Cédric Notredame,et al.  3DCoffee: combining protein sequences and structures within multiple sequence alignments. , 2004, Journal of molecular biology.

[19]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[20]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[21]  Giorgio Gambosi,et al.  Complexity and Approximation , 1999, Springer Berlin Heidelberg.

[22]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[23]  William R. Taylor,et al.  Structure Comparison and Structure Patterns , 2000, J. Comput. Biol..

[24]  John D. Kececioglu,et al.  The Maximum Weight Trace Problem in Multiple Sequence Alignment , 1993, CPM.

[25]  Philip E. Bourne,et al.  A New Algorithm for the Alignment of Multiple Protein Structures Using Monte Carlo Optimization , 2000, Pacific Symposium on Biocomputing.