Evolutionary inaccuracy of pairwise structural alignments

Motivation: Structural alignment methods are widely used to generate gold standard alignments for improving multiple sequence alignments and transferring functional annotations, as well as for assigning structural distances between proteins. However, the correctness of the alignments generated by these methods is difficult to assess objectively since little is known about the exact evolutionary history of most proteins. Since homology is an equivalence relation, an upper bound on alignment quality can be found by assessing the consistency of alignments. Measuring the consistency of current methods of structure alignment and determining the causes of inconsistencies can, therefore, provide information on the quality of current methods and suggest possibilities for further improvement. Results: We analyze the self-consistency of seven widely-used structural alignment methods (SAP, TM-align, Fr-TM-align, MAMMOTH, DALI, CE and FATCAT) on a diverse, non-redundant set of 1863 domains from the SCOP database and demonstrate that even for relatively similar proteins the degree of inconsistency of the alignments on a residue level is high (30%). We further show that levels of consistency vary substantially between methods, with two methods (SAP and Fr-TM-align) producing more consistent alignments than the rest. Inconsistency is found to be higher near gaps and for proteins of low structural complexity, as well as for helices. The ability of the methods to identify good structural alignments is also assessed using geometric measures, for which FATCAT (flexible mode) is found to be the best performer despite being highly inconsistent. We conclude that there is substantial scope for improving the consistency of structural alignment methods. Contact: msadows@nimr.mrc.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Adam Godzik,et al.  TOPS++FATCAT: Fast flexible structural alignment using constraints derived from TOPS+ Strings Model , 2008, BMC Bioinformatics.

[2]  Ming Tang,et al.  PROMALS3D web server for accurate multiple protein sequence and structure alignments , 2008, Nucleic Acids Res..

[3]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[4]  William R Taylor,et al.  Protein structures, folds and fold spaces , 2010, Journal of physics. Condensed matter : an Institute of Physics journal.

[5]  William R. Taylor,et al.  Multiple Protein Sequence Alignment using Double-dynamic Programming , 2000, Comput. Chem..

[6]  Jeffrey Skolnick,et al.  Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score , 2008, BMC Bioinformatics.

[7]  Ralf Zimmer,et al.  Vorolign - fast structural alignment using Voronoi contacts , 2007, Bioinform..

[8]  William R. Taylor,et al.  Flexible Secondary Structure Based Protein Structure Comparison Applied to the Detection of Circular Permutation , 2006, J. Comput. Biol..

[9]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[10]  H. Wolfson,et al.  Detection of non-topological motifs in protein structures. , 1996, Protein engineering.

[11]  A. Godzik The structural alignment between two proteins: Is there a unique answer? , 1996, Protein science : a publication of the Protein Society.

[12]  Allen Holder,et al.  A Spectral Approach to Protein Structure Alignment , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Peter Lackner,et al.  Comparative Analysis of Protein Structure Alignments , 2007, BMC Structural Biology.

[14]  Ming-Jing Hwang,et al.  Alternative alignments from comparison of protein structures , 2004, Proteins.

[15]  W. Taylor,et al.  Protein Products of Tandem Gene Duplication: A Structural View , 2011 .

[16]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[17]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[18]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[19]  Yu Chen,et al.  A novel approach to structural alignment using realistic structural and environmental information , 2005, Protein science : a publication of the Protein Society.

[20]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[21]  Oliviero Carugo Recent progress in measuring structural similarity between proteins. , 2007, Current protein & peptide science.

[22]  Osvaldo Olmea,et al.  MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison , 2002, Protein science : a publication of the Protein Society.

[23]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[24]  Richard C. Wilson,et al.  Flexible structural protein alignment by a sequence of local transformations , 2009, Bioinform..

[25]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[26]  Georg E. Schulz,et al.  Recognition of phylogenetic relationships from polypeptide chain fold similarities , 1977, Journal of Molecular Evolution.

[27]  M. Sippl,et al.  ProSup: a refined tool for protein structure alignment. , 2000, Protein engineering.

[28]  W. Taylor Protein structure comparison using iterated double dynamic programming , 2008, Protein science : a publication of the Protein Society.

[29]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[30]  W R Taylor,et al.  On the evolutionary origins of "Fold Space Continuity": a study of topological convergence and divergence in mixed alpha-beta domains. , 2010, Journal of structural biology.

[31]  Cyrus Chothia,et al.  SUPERFAMILY 1.75 including a domain-centric gene ontology method , 2010, Nucleic Acids Res..

[32]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[33]  Roberto Mosca,et al.  Alignment of protein structures in the presence of domain motions , 2008, BMC Bioinformatics.

[34]  Fabrice Armougom,et al.  Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee , 2006, Nucleic Acids Res..

[35]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[36]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[37]  Cédric Notredame,et al.  3DCoffee: combining protein sequences and structures within multiple sequence alignments. , 2004, Journal of molecular biology.

[38]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[39]  K Henrick,et al.  Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. , 2004, Acta crystallographica. Section D, Biological crystallography.

[40]  William R. Taylor,et al.  Exploring the limits of fold discrimination by structural alignment: A large scale benchmark using decoys of known fold , 2011, Comput. Biol. Chem..

[41]  William R Taylor,et al.  A Fourier analysis of symmetry in protein structure. , 2002, Protein engineering.

[42]  Rachel Kolodny,et al.  Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. , 2005, Journal of molecular biology.

[43]  D. Liberles,et al.  Evolution after gene duplication , 2010 .

[44]  Ruth Nussinov,et al.  FlexProt: Alignment of Flexible Protein Structures Without a Predefinition of Hinge Regions , 2004, J. Comput. Biol..

[45]  Mohammed J. Zaki,et al.  FlexSnap: Flexible Non-sequential Protein Structure Alignment , 2009, Algorithms for Molecular Biology.

[46]  Ruth Nussinov,et al.  GOSSIP: a method for fast and accurate global alignment of protein structures , 2011, Bioinform..

[47]  Václav Snásel,et al.  Searching Protein 3-D Structures for Optimal Structure Alignment Using Intelligent Algorithms and Data Structures , 2010, IEEE Transactions on Information Technology in Biomedicine.

[48]  Inbal Budowski-Tal,et al.  FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately , 2010, Proceedings of the National Academy of Sciences.