Smolign: A Spatial Motifs-Based Protein Multiple Structural Alignment Method

Availability of an effective tool for protein multiple structural alignment (MSTA) is essential for discovery and analysis of biologically significant structural motifs that can help solve functional annotation and drug design problems. Existing MSTA methods collect residue correspondences mostly through pairwise comparison of consecutive fragments, which can lead to suboptimal alignments, especially when the similarity among the proteins is low. We introduce a novel strategy based on: building a contact-window based motif library from the protein structural data, discovery and extension of common alignment seeds from this library, and optimal superimposition of multiple structures according to these alignment seeds by an enhanced partial order curve comparison method. The ability of our strategy to detect multiple correspondences simultaneously, to catch alignments globally, and to support flexible alignments, endorse a sensitive and robust automated algorithm that can expose similarities among protein structures even under low similarity conditions. Our method yields better alignment results compared to other popular MSTA methods, on several protein structure data sets that span various structural folds and represent different protein similarity levels. A web-based alignment tool, a downloadable executable, and detailed alignment results for the data sets used here are available at http://sacan.biomed. drexel.edu/Smolign and http://bio.cse.ohio-state.edu/Smolign.

[1]  W R Taylor,et al.  SSAP: sequential structure alignment program for protein structure comparison. , 1996, Methods in enzymology.

[2]  A. Godzik,et al.  Regularities in interaction patterns of globular proteins. , 1993, Protein engineering.

[3]  L. Chew,et al.  Unit‐vector RMS (URMS) as a tool to analyze molecular dynamics trajectories , 1999, Proteins.

[4]  Jieping Ye,et al.  Multiple structure alignment and consensus identification for proteins , 2006, BMC Bioinformatics.

[5]  Wayne J. Pullan Protein Structure Alignment Using Maximum Cliques and Local Search , 2007, Australian Conference on Artificial Intelligence.

[6]  C. Lemmen,et al.  FLEXS: a method for fast flexible ligand superposition. , 1998, Journal of medicinal chemistry.

[7]  Thomas Steinke,et al.  Connectivity independent protein-structure alignment: a hierarchical approach , 2006, BMC Bioinformatics.

[8]  John C. Hart,et al.  Visualizing quaternion rotation , 1994, TOGS.

[9]  Lode Wyns,et al.  SABmark- a benchmark for sequence alignment that covers the entire known fold space , 2005, Bioinform..

[10]  J. Leunissen,et al.  Subtilases: The superfamily of subtilisin‐like serine proteases , 1997, Protein science : a publication of the Protein Society.

[11]  Jieping Ye,et al.  Approximate Multiple Protein Structure Alignment Using the Sum-of-Pairs Distance , 2004, J. Comput. Biol..

[12]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[13]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[14]  Yuan-Fang Wang,et al.  CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[15]  Wei Wang,et al.  Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development , 2009, J. Comput. Aided Mol. Des..

[16]  Timothy F. Havel,et al.  The theory and practice of distance geometry , 1985 .

[17]  A. Konagurthu,et al.  MUSTANG: A multiple structural alignment algorithm , 2006, Proteins.

[18]  W. Kabsch A discussion of the solution for the best rotation to relate two sets of vectors , 1978 .

[19]  Ozgur Ozturk,et al.  LFM-Pro: a tool for detecting significant local structural sites in proteins , 2007, Bioinform..

[20]  J. Szustakowski,et al.  Protein structure alignment using a genetic algorithm , 2000, Proteins.

[21]  R. Lathrop The protein threading problem with sequence amino acid interaction preferences is NP-complete. , 1994, Protein engineering.

[22]  Lenore Cowen,et al.  Matt: Local Flexibility Aids Protein Multiple Structure Alignment , 2008, PLoS Comput. Biol..

[23]  Chris Sander,et al.  3-D Lookup: Fast Protein Structure Database Searches at 90% Reliability , 1995, ISMB.

[24]  Joel Sokol,et al.  Optimal Protein Structure Alignment Using Maximum Cliques , 2005, Oper. Res..

[25]  H. Wolfson,et al.  Multiple structural alignment by secondary structures: Algorithm and applications , 2003, Protein science : a publication of the Protein Society.

[26]  Adam Godzik,et al.  Multiple flexible structure alignment using partial order graphs , 2005, Bioinform..

[27]  Gerard J Kleywegt,et al.  Déjà vu all over again: finding and analyzing protein structure similarities. , 2004, Structure.

[28]  Olivier Poch,et al.  BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs , 1999, Bioinform..

[29]  M Levitt,et al.  Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins , 1998, Protein science : a publication of the Protein Society.

[30]  Conrad C. Huang,et al.  MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance , 2003, Bioinform..

[31]  Jack Snoeyink,et al.  Multiple structure alignment by optimal RMSD implies that the average structure is a consensus. , 2006, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[32]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[33]  Hong Sun,et al.  Enhanced partial order curve comparison over multiple protein folding trajectories. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[34]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[35]  A. Murzin OB(oligonucleotide/oligosaccharide binding)‐fold: common structural and functional solution for non‐homologous sequences. , 1993, The EMBO journal.

[36]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[37]  Jon M. Kleinberg,et al.  Fast detection of common geometric substructure in proteins , 1999, J. Comput. Biol..

[38]  William R. Taylor,et al.  Structure Comparison and Structure Patterns , 2000, J. Comput. Biol..

[39]  Osvaldo Olmea,et al.  MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison , 2002, Protein science : a publication of the Protein Society.

[40]  M. Sternberg,et al.  A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. , 1987, Journal of molecular biology.

[41]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[42]  T. P. Flores,et al.  Multiple protein structure alignment , 1994, Protein science : a publication of the Protein Society.

[43]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[44]  Christopher J. Lee,et al.  Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems , 2004, Bioinform..

[45]  Cristian Micheletti,et al.  MISTRAL: a tool for energy-based multiple structural alignment of proteins , 2009, Bioinform..

[46]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[47]  H. Wolfson,et al.  Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[48]  G. Barton,et al.  Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels , 1992, Proteins.

[49]  Narmada Thanki,et al.  CDD: a Conserved Domain Database for the functional annotation of proteins , 2010, Nucleic Acids Res..

[50]  Philip E. Bourne,et al.  CE-MC: a multiple protein structure alignment server , 2004, Nucleic Acids Res..

[51]  Christopher J. Lee,et al.  Multiple sequence alignment using partial order graphs , 2002, Bioinform..

[52]  Andreas Prlic,et al.  SISYPHUS—structural alignments for proteins with non-trivial relationships , 2006, Nucleic Acids Res..

[53]  A. Lesk,et al.  How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. , 1980, Journal of molecular biology.

[54]  N. Go,et al.  Common spatial arrangements of backbone fragments in homologous and non-homologous proteins. , 1992, Journal of molecular biology.

[55]  Christos H. Papadimitriou,et al.  Algorithmic aspects of protein structure similarity , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[56]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[57]  J. Richardson,et al.  The anatomy and taxonomy of protein structure. , 1981, Advances in protein chemistry.

[58]  N. Ben-Tal,et al.  ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. , 2001, Journal of molecular biology.

[59]  Ruth Nussinov,et al.  MultiProt - A Multiple Protein Structural Alignment Algorithm , 2002, WABI.

[60]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[61]  Alejandra Leo-Macias,et al.  A new progressive-iterative algorithm for multiple structure alignment , 2005, Bioinform..