Optimization of multiple‐sequence alignment based on multiple‐structure alignment

Routinely used multiple‐sequence alignment methods use only sequence information. Consequently, they may produce inaccurate alignments. Multiple‐structure alignment methods, on the other hand, optimize structural alignment by ignoring sequence information. Here, we present an optimization method that unifies sequence and structure information. The alignment score is based on standard amino acid substitution probabilities combined with newly computed three‐dimensional structure alignment probabilities. The advantage of our alignment scheme is in its ability to produce more accurate multiple alignments. We demonstrate the usefulness of the method in three applications: 1) computing more accurate multiple‐sequence alignments, 2) analyzing protein conformational changes, and 3) computation of amino acid structure‐sequence conservation with application to protein–protein docking prediction. The method is available at http://bioinfo3d.cs.tau.ac.il/staccato/. Proteins 2006. © 2005 Wiley‐Liss, Inc.

[1]  T. Blundell,et al.  Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. , 1990, Journal of molecular biology.

[2]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[3]  D Fischer,et al.  A computer vision based technique for 3-D sequence-independent structural comparison of proteins. , 1993, Protein engineering.

[4]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[5]  T. P. Flores,et al.  Multiple protein structure alignment , 1994, Protein science : a publication of the Protein Society.

[6]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[7]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[8]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[9]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[10]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[11]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[12]  Sim,et al.  Protein Threading Based on Multiple Protein Structure Alignment. , 1999, Genome informatics. Workshop on Genome Informatics.

[13]  Olivier Poch,et al.  BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs , 1999, Bioinform..

[14]  S. Hubbard,et al.  Protein tyrosine kinase structure and function. , 2000, Annual review of biochemistry.

[15]  William R. Taylor,et al.  Structure Comparison and Structure Patterns , 2000, J. Comput. Biol..

[16]  Michael Lappe,et al.  A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3 , 2001, Nucleic Acids Res..

[17]  Philip E. Bourne,et al.  A New Algorithm for the Alignment of Multiple Protein Structures Using Monte Carlo Optimization , 2000, Pacific Symposium on Biocomputing.

[18]  Ruth Nussinov,et al.  Efficient Unbound Docking of Rigid Molecules , 2002, WABI.

[19]  Ruth Nussinov,et al.  MultiProt - A Multiple Protein Structural Alignment Algorithm , 2002, WABI.

[20]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[21]  B. Honig,et al.  On the role of structural information in remote homology detection and sequence alignment: new methods using hybrid sequence profiles. , 2003, Journal of molecular biology.

[22]  B. Honig,et al.  Structural genomics: Computational methods for structure analysis , 2003, Protein science : a publication of the Protein Society.

[23]  H. Wolfson,et al.  Multiple structural alignment by secondary structures: Algorithm and applications , 2003, Protein science : a publication of the Protein Society.

[24]  Carsten Peterson,et al.  Potential for dramatic improvement in sequence alignment against structures of remote homologous proteins by extracting structural information from multiple structure alignment. , 2003, Journal of molecular biology.

[25]  Tal Pupko,et al.  Structural Genomics , 2005 .

[26]  Ruth Nussinov,et al.  A method for simultaneous alignment of multiple protein structures , 2004, Proteins.

[27]  Cédric Notredame,et al.  3DCoffee: combining protein sequences and structures within multiple sequence alignments. , 2004, Journal of molecular biology.

[28]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[29]  Ben M. Webb,et al.  Comparative Protein Structure Modeling Using MODELLER , 2007, Current protocols in protein science.