Development and validation of a consistency based multiple structure alignment algorithm

SUMMARY We introduce an algorithm that uses the information gained from simultaneous consideration of an entire group of related proteins to create multiple structure alignments (MSTAs). Consistency-based alignment (CBA) first harnesses the information contained within regions that are consistently aligned among a set of pairwise superpositions in order to realign pairs of proteins through both global and local refinement methods. It then constructs a multiple alignment that is maximally consistent with the improved pairwise alignments. We validate CBA's alignments by assessing their accuracy in regions where at least two of the aligned structures contain the same conserved sequence motif. RESULTS CBA correctly aligns well over 90% of motif residues in superpositions of proteins belonging to the same family or superfamily, and it outperforms a number of previously reported MSTA algorithms.

[1]  R. Doolittle Of urfs and orfs : a primer on how to analyze devised amino acid sequences , 1986 .

[2]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[3]  O. Gotoh Consistency of optimal sequence alignments. , 1990, Bulletin of Mathematical Biology.

[4]  T. Blundell,et al.  Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. , 1990, Journal of molecular biology.

[5]  G. Barton,et al.  Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels , 1992, Proteins.

[6]  T. P. Flores,et al.  Multiple protein structure alignment , 1994, Protein science : a publication of the Protein Society.

[7]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[8]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[9]  Mark Gerstein,et al.  Using Iterative Dynamic Programming to Obtain Accurate Pairwise and Multiple Alignments of Protein Structures , 1996, ISMB.

[10]  Douglas L. Brutlag,et al.  Hierarchical Protein Structure Superposition Using Both Secondary Structure and Atomic Representations , 1997, ISMB.

[11]  D. Brutlag,et al.  Highly specific protein sequence motifs for genome analysis. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[12]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[13]  Sim,et al.  Protein Threading Based on Multiple Protein Structure Alignment. , 1999, Genome informatics. Workshop on Genome Informatics.

[14]  A. Panchenko,et al.  Threading with explicit models for evolutionary conservation of structure and sequence , 1999, Proteins.

[15]  Olivier Poch,et al.  BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs , 1999, Bioinform..

[16]  S. Dongen Graph clustering by flow simulation , 2000 .

[17]  I. Jonassen,et al.  Searching the protein structure databank with weak sequence patterns and structural constraints. , 2000, Journal of molecular biology.

[18]  B. Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments. , 2000, Journal of molecular biology.

[19]  Annabel E. Todd,et al.  From structure to function: Approaches and limitations , 2000, Nature Structural Biology.

[20]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[21]  B Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. , 2000, Journal of molecular biology.

[22]  Ruth Nussinov,et al.  MUSTA - A General, Efficient, Automated Method for Multiple Structure Alignment and Detection of Common Motifs: Application to Proteins , 2001, J. Comput. Biol..

[23]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[24]  Amos Bairoch,et al.  PROSITE: A Documented Database Using Patterns and Profiles as Motif Descriptors , 2002, Briefings Bioinform..

[25]  Jens Stoye,et al.  Common intervals and sorting by reversals: a marriage of necessity , 2002, ECCB.

[26]  Christopher Bystroff,et al.  Fully automated ab initio protein structure prediction using I-STES, HMMSTR and ROSETTA , 2002, ISMB.

[27]  A. Panchenko,et al.  Comparison of sequence and structure alignments for protein domains , 2002, Proteins.

[28]  Ruth Nussinov,et al.  MASS: multiple structural alignment by secondary structures , 2003, ISMB.

[29]  Gajendra P. S. Raghava,et al.  OXBench: A benchmark for evaluation of protein multiple sequence alignment accuracy , 2003, BMC Bioinformatics.

[30]  Lode Wyns,et al.  Consistency matrices: Quantified structure alignments for sets of related proteins , 2003, Proteins.

[31]  Xiu-fen Lei,et al.  Measurement of DNA mismatch repair activity in live cells. , 2004, Nucleic acids research.

[32]  Ruth Nussinov,et al.  A method for simultaneous alignment of multiple protein structures , 2004, Proteins.

[33]  María Elena Ochagavía,et al.  Progressive combinatorial algorithm for multiple structural alignments: Application to distantly related proteins , 2004, Proteins.

[34]  Philip E. Bourne,et al.  CE-MC: a multiple protein structure alignment server , 2004, Nucleic Acids Res..

[35]  D. Brutlag,et al.  FoldMiner: Structural motif discovery using an improved superposition algorithm , 2004, Protein science : a publication of the Protein Society.

[36]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[37]  Erik Sandelin,et al.  Extracting multiple structural alignments from pairwise alignments: a comparison of a rigorous and a heuristic approach , 2005, Bioinform..