MICAN : a protein structure alignment algorithm that can handle Multiple-chains, Inverse alignments, Cα only models, Alternative alignments, and Non-sequential alignments

BackgroundProtein pairs that have the same secondary structure packing arrangement but have different topologies have attracted much attention in terms of both evolution and physical chemistry of protein structures. Further investigation of such protein relationships would give us a hint as to how proteins can change their fold in the course of evolution, as well as a insight into physico-chemical properties of secondary structure packing. For this purpose, highly accurate sequence order independent structure comparison methods are needed.ResultsWe have developed a novel protein structure alignment algorithm, MICAN (a structure alignment algorithm that can handle M ultiple-chain complexes, I nverse direction of secondary structures, Cα only models, A lternative alignments, and N on-sequential alignments). The algorithm was designed so as to identify the best structural alignment between protein pairs by disregarding the connectivity between secondary structure elements (SSE). One of the key feature of the algorithm is utilizing the multiple vector representation for each SSE, which enables us to correctly treat bent or twisted nature of long SSE. We compared MICAN with other 9 publicly available structure alignment programs, using both reference-dependent and reference-independent evaluation methods on a variety of benchmark test sets which include both sequential and non-sequential alignments. We show that MICAN outperforms the other existing methods for reproducing reference alignments of non-sequential test sets. Further, although MICAN does not specialize in sequential structure alignment, it showed the top level performance on the sequential test sets. We also show that MICAN program is the fastest non-sequential structure alignment program among all the programs we examined here.ConclusionsMICAN is the fastest and the most accurate program among non-sequential alignment programs we examined here. These results suggest that MICAN is a highly effective tool for automatically detecting non-trivial structural relationships of proteins, such as circular permutations and segment-swapping, many of which have been identified manually by human experts so far. The source code of MICAN is freely download-able at http://www.tbp.cse.nagoya-u.ac.jp/MICAN.

[1]  Fabrice Armougom,et al.  The iRMSD: a local measure of sequence alignment accuracy using structural information , 2006, ISMB.

[2]  Ruth Nussinov,et al.  MASS: multiple structural alignment by secondary structures , 2003, ISMB.

[3]  Mohammed J. Zaki,et al.  Iterative Non-Sequential protein Structural Alignment , 2009, J. Bioinform. Comput. Biol..

[4]  Akira R. Kinjo,et al.  Similarity search for local protein structures at atomic resolution by exploiting a database management system , 2007, Biophysics.

[5]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[6]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[7]  Kenji Mizuguchi,et al.  HOMSTRAD: recent developments of the Homologous Protein Structure Alignment Database , 2004, Nucleic Acids Res..

[8]  N. Grishin,et al.  PROMALS3D: a tool for multiple protein sequence and structure alignments , 2008, Nucleic acids research.

[9]  Liisa Holm,et al.  DaliLite workbench for protein structure comparison , 2000, Bioinform..

[10]  Luonan Chen,et al.  Revealing divergent evolution, identifying circular permutations and detecting active-sites by protein structure comparison , 2006, BMC Structural Biology.

[11]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[12]  K Henrick,et al.  Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. , 2004, Acta crystallographica. Section D, Biological crystallography.

[13]  Alexej Abyzov,et al.  Structural alignment of proteins by a novel TOPOFIT method, as a superimposition of common volumes at a topomax point , 2004, Protein science : a publication of the Protein Society.

[14]  Gerard J Kleywegt,et al.  Evaluation of protein fold comparison servers , 2003, Proteins.

[15]  Peter Lackner,et al.  Comparative Analysis of Protein Structure Alignments , 2007, BMC Structural Biology.

[16]  Changhoon Kim,et al.  Accuracy of structure-based sequence alignment of automatic methods , 2007, BMC Bioinformatics.

[17]  Leonidas J. Guibas,et al.  Inverse Kinematics in Biology: The Protein Loop Closure Problem , 2005, Int. J. Robotics Res..

[18]  William R Taylor,et al.  Decoy models for protein structure comparison score normalisation. , 2006, Journal of molecular biology.

[19]  Thomas Steinke,et al.  Connectivity independent protein-structure alignment: a hierarchical approach , 2006, BMC Bioinformatics.

[20]  J. Jung,et al.  Circularly permuted proteins in the protein structure database , 2001, Protein science : a publication of the Protein Society.

[21]  Aysam Guerler,et al.  GIS: a comprehensive source for protein structure similarities , 2010, Nucleic Acids Res..

[22]  Nick V Grishin,et al.  Discrimination between distant homologs and structural analogs: lessons from manually constructed, reliable data sets. , 2008, Journal of molecular biology.

[23]  Bogdan Lesyng,et al.  A novel method to compare protein structures using local descriptors , 2011, BMC Bioinformatics.

[24]  N. Alexandrov,et al.  SARFing the PDB. , 1996, Protein engineering.

[25]  Bhaskar DasGupta,et al.  Topology independent protein structural alignment , 2007, BMC Bioinformatics.

[26]  Nick V. Grishin,et al.  MALISAM: a database of structurally analogous motifs in proteins , 2007, Nucleic Acids Res..

[27]  Michael Levitt,et al.  On the universe of protein folds. , 2013, Annual review of biophysics.

[28]  Ming-Jing Hwang,et al.  OPAAS: a web server for optimal, permuted, and other alternative alignments of protein structures , 2006, Nucleic Acids Res..

[29]  Peter Lackner,et al.  Accuracy analysis of multiple structure alignments , 2009, Protein science : a publication of the Protein Society.

[30]  S. Bryant,et al.  Identification of homologous core structures , 1999, Proteins.

[31]  Aysam Guerler,et al.  Novel protein folds and their nonsequential structural analogs , 2008, Protein science : a publication of the Protein Society.

[32]  N. Grishin,et al.  MALIDUP: A database of manually constructed structure alignments for duplicated domain pairs , 2007, Proteins.

[33]  K D Wilkinson,et al.  Crystal structure of a deubiquitinating enzyme (human UCH‐L3) at 1.8 å resolution , 1997, The EMBO journal.

[34]  D. Pignol,et al.  A papain-like enzyme at work: native and acyl-enzyme intermediate structures in phytochelatin synthesis. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Yang Zhang,et al.  How significant is a protein structure similarity with TM-score = 0.5? , 2010, Bioinform..

[36]  H. Wolfson,et al.  Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[37]  F A Hamprecht,et al.  Generation of pseudonative protein structures for threading , 1997, Proteins.

[38]  Haruki Nakamura,et al.  Comprehensive structural classification of ligand-binding motifs in proteins. , 2008, Structure.

[39]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[40]  G. Schneider,et al.  Circular permutations of natural protein sequences: structural evidence. , 1997, Current opinion in structural biology.

[41]  Andreas Prlic,et al.  SISYPHUS—structural alignments for proteins with non-trivial relationships , 2006, Nucleic Acids Res..

[42]  Alexej Abyzov,et al.  A comprehensive analysis of non-sequential alignments between all protein structures , 2007, BMC Structural Biology.

[43]  H. Wolfson,et al.  Detection of non-topological motifs in protein structures. , 1996, Protein engineering.

[44]  Markus Porto,et al.  SABERTOOTH: protein structural alignment based on a vectorial structure representation , 2007, BMC Bioinformatics.

[45]  N. Grishin Fold change in evolution of protein structures. , 2001, Journal of structural biology.

[46]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[47]  Simon Kasif,et al.  Less is more: towards an optimal universal description of protein folds , 2005, ECCB/JBI.

[48]  W. Pearson,et al.  Sensitivity and selectivity in protein structure comparison , 2004, Protein science : a publication of the Protein Society.

[49]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[50]  Aysam Guerler,et al.  Circular permuted proteins in the universe of protein folds , 2010, Proteins.

[51]  Xin Yuan,et al.  Non-sequential structure-based alignments reveal topology-independent core packing arrangements in proteins , 2005, Bioinform..

[52]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.