Iterative Non-Sequential protein Structural Alignment

Structural similarity between proteins gives us insights on the evolutionary relationship between proteins which have low sequence similarity. In this paper, we present a novel approach called STSA for non-sequential pair-wise structural alignment. Starting from an initial alignment, our approach iterates over a two-step process, a superposition step and an alignment step, until convergence. Given two superposed structures, we propose a novel greedy algorithm to construct both sequential and non-sequential alignments. The quality of STSA alignments is evident in the high agreement it has with the reference alignments in the challenging-to-align RPIC set. Moreover, on a dataset of 4410 protein pairs selected from the CATH database, STSA has a high sensitivity and high specificity values and is competitive with state-of-the-art alignment methods and gives longer alignments with lower rmsd. The STSA software along with the data sets will be made available on line at http://www.cs.rpi.edu/-zaki/software/STSA.

[1]  J. Jung,et al.  Protein structure alignment using environmental profiles. , 2000, Protein engineering.

[2]  W R Taylor,et al.  SSAP: sequential structure alignment program for protein structure comparison. , 1996, Methods in enzymology.

[3]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[4]  Takeshi Kawabata,et al.  MATRAS: a program for protein 3D structure comparison , 2003, Nucleic Acids Res..

[5]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[6]  Rachel Kolodny,et al.  Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. , 2005, Journal of molecular biology.

[7]  William R. Taylor,et al.  Structure Comparison and Structure Patterns , 2000, J. Comput. Biol..

[8]  Adam Godzik,et al.  In search for more accurate alignments in the twilight zone , 2002, Protein science : a publication of the Protein Society.

[9]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[10]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[11]  N. Alexandrov,et al.  SARFing the PDB. , 1996, Protein engineering.

[12]  Enno Ohlebusch,et al.  Chaining algorithms for multiple genome comparison , 2005, J. Discrete Algorithms.

[13]  David R. Gilbert,et al.  TOPS: an enhanced database of protein structural topology , 2004, Nucleic Acids Res..

[14]  Gene H. Golub,et al.  Matrix computations , 1983 .

[15]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[16]  M Levitt,et al.  Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins , 1998, Protein science : a publication of the Protein Society.

[17]  Xin Yuan,et al.  Non-sequential structure-based alignments reveal topology-independent core packing arrangements in proteins , 2005, Bioinform..

[18]  M. Levitt,et al.  Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core , 1993, Current Biology.

[19]  H. Wolfson,et al.  Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Andreas Prlic,et al.  SISYPHUS—structural alignments for proteins with non-trivial relationships , 2006, Nucleic Acids Res..

[21]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[22]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[23]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[24]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[25]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[26]  D Fischer,et al.  A computer vision based technique for 3-D sequence-independent structural comparison of proteins. , 1993, Protein engineering.

[27]  Feng Gao,et al.  Indexing protein structures using suffix trees. , 2008, Methods in molecular biology.

[28]  N. Srinivasan,et al.  A substitution matrix for structural alphabet based on structural alignment of homologous proteins and its applications , 2006, Proteins.

[29]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[30]  Peter Lackner,et al.  Comparative Analysis of Protein Structure Alignments , 2007, BMC Structural Biology.

[31]  Mark Gerstein,et al.  Using Iterative Dynamic Programming to Obtain Accurate Pairwise and Multiple Alignments of Protein Structures , 1996, ISMB.

[32]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[33]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[34]  Yuan-Fang Wang,et al.  CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[35]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[36]  Bonnie Berger,et al.  A Parameterized Algorithm for Protein Structure Alignment , 2007, J. Comput. Biol..

[37]  Nathan Linial,et al.  Approximate protein structural alignment in polynomial time. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[38]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[39]  Zhiping Weng,et al.  FAST: A novel protein structure alignment algorithm , 2004, Proteins.