Formatt: correcting protein multiple structural alignments by sequence peeking

We present Formatt, a multiple structure alignment program based on the Matt purely geometric multiple structural alignment program, that also takes into account sequence similarity when constructing alignments. We show that Formatt is superior to Matt in alignment quality based on objective measures (most notably Staccato sequence and structure scores) while preserving the same advantages in core length and RMSD that Matt has as a flexible structure aligner, as compared to other multiple structure alignment programs on popular benchmark datasets. Applications include producing better training data for threading methods.

[1]  Cédric Notredame,et al.  3DCoffee: combining protein sequences and structures within multiple sequence alignments. , 2004, Journal of molecular biology.

[2]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[3]  Mohammed J. Zaki,et al.  FlexSnap: Flexible Non-sequential Protein Structure Alignment , 2009, Algorithms for Molecular Biology.

[4]  A. Konagurthu,et al.  MUSTANG: A multiple structural alignment algorithm , 2006, Proteins.

[5]  Narayanan Eswar,et al.  Alignment of multiple protein structures based on sequence and structure features. , 2009, Protein engineering, design & selection : PEDS.

[6]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[7]  Lode Wyns,et al.  SABmark- a benchmark for sequence alignment that covers the entire known fold space , 2005, Bioinform..

[8]  N. Grishin,et al.  PROMALS3D: a tool for multiple protein sequence and structure alignments , 2008, Nucleic acids research.

[9]  Kazutaka Katoh,et al.  Recent developments in the MAFFT multiple sequence alignment program , 2008, Briefings Bioinform..

[10]  Lenore Cowen,et al.  Matt: Local Flexibility Aids Protein Multiple Structure Alignment , 2008, PLoS Comput. Biol..

[11]  H. Wolfson,et al.  Optimization of multiple‐sequence alignment based on multiple‐structure alignment , 2005, Proteins.

[12]  Lenore Cowen,et al.  Markov random fields reveal an N-terminal double beta-propeller motif as part of a bacterial hybrid two-component sensor system , 2010, Proceedings of the National Academy of Sciences.

[13]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[14]  N. Grishin,et al.  MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information , 2006, Nucleic acids research.

[15]  Lenore Cowen,et al.  Recognition of beta-structural motifs using hidden Markov models trained with simulated evolution , 2010, Bioinform..

[16]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[17]  Roberto Mosca,et al.  Alignment of protein structures in the presence of domain motions , 2008, BMC Bioinformatics.

[18]  Changhoon Kim,et al.  Accuracy of structure-based sequence alignment of automatic methods , 2007, BMC Bioinformatics.

[19]  Lenore Cowen,et al.  Touring Protein Space with Matt , 2010, ISBRA.

[20]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[21]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[22]  H. Wolfson,et al.  Flexible protein alignment and hinge detection , 2002, Proteins.

[23]  Adam Godzik,et al.  Multiple flexible structure alignment using partial order graphs , 2005, Bioinform..

[24]  Liisa Holm,et al.  Advances and pitfalls of protein structural alignment. , 2009, Current opinion in structural biology.