MUSTA - A General, Efficient, Automated Method for Multiple Structure Alignment and Detection of Common Motifs: Application to Proteins

Here we present an algorithm designed to carry out multiple structure alignment and to detect recurring substructural motifs. So far we have implemented it for comparison of protein structures. However, this general method is applicable to comparisons of RNA structures and to detection of a pharmacophore in a series of drug molecules. Further, its sequence order independence permits its application to detection of motifs on protein surfaces, interfaces, and binding/active sites. While there are many methods designed to carry out pairwise structure comparisons, there are only a handful geared toward the multiple structure alignment task. Most of these tackle multiple structure comparison as a collection of pairwise structure comparison tasks. The multiple structural alignment algorithm presented here automatically finds the largest common substructure (core) of atoms that appears in all the molecules in the ensemble. The detection of the core and the structural alignment are done simultaneously. The algorithm begins by finding small substructures that are common to all the proteins in the ensemble. One of the molecules is considered the reference; the others are the source molecules. The small substructures are stored in special arrays termed combinatorial buckets, which define sets of multistructural alignments from the source molecules that coincide with the same small set of reference atoms (C(alpha)-atoms here). These substructures are initial small fragments that have congruent copies in each of the proteins. The substructures are extended, through the processing of the combinatorial buckets, by clustering the superpositions (transformations). The method is very efficient.

[1]  S. Colowick,et al.  Methods in Enzymology , Vol , 1966 .

[2]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[3]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[4]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[5]  Dennis H. Smith,et al.  Computer-assisted examination of compounds for common three-dimensional substructures , 1983, Journal of chemical information and computer sciences.

[6]  George C. Stockman,et al.  Object recognition and localization via pose clustering , 1987, Comput. Vis. Graph. Image Process..

[7]  P. Willett,et al.  Pharmacophoric pattern matching in files of 3d chemical structures: comparison of geometric searching algorithms , 1987 .

[8]  K. S. Arun,et al.  Least-Squares Fitting of Two 3-D Point Sets , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Peter Willett,et al.  Algorithms for the identification of three-dimensional maximal common substructures , 1987, J. Chem. Inf. Comput. Sci..

[10]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[11]  D. Lipman,et al.  Trees, stars, and multiple biological sequence alignment , 1989 .

[12]  T. Blundell,et al.  Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. , 1990, Journal of molecular biology.

[13]  H. Wolfson,et al.  Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Andrew Smellie,et al.  Fast drug-receptor mapping by site-directed distances: a novel method of predicting new pharmacological leads , 1991, J. Chem. Inf. Comput. Sci..

[15]  A. K. Wong,et al.  A survey of multiple sequence comparison methods. , 1992, Bulletin of mathematical biology.

[16]  John P. Overington,et al.  Alignment and searching for common protein folds using a data bank of structural templates. , 1993, Journal of molecular biology.

[17]  D Fischer,et al.  A computer vision based technique for 3-D sequence-independent structural comparison of proteins. , 1993, Protein engineering.

[18]  David J. Hand,et al.  The Data Sets , 1994 .

[19]  C. Sander,et al.  Searching protein structure databases has come of age , 1994, Proteins.

[20]  R. Nussinov,et al.  A 3D sequence-independent representation of the protein data bank. , 1995, Protein engineering.

[21]  W R Taylor,et al.  SSAP: sequential structure alignment program for protein structure comparison. , 1996, Methods in enzymology.

[22]  R. Doolittle Computer methods for macromolecular sequence analysis , 1996 .

[23]  M B Swindells,et al.  Detecting structural similarities: a user's guide. , 1996, Methods in enzymology.

[24]  P Willett,et al.  Using a genetic algorithm to identify common structural features in sets of ligands. , 1997, Journal of molecular graphics & modelling.

[25]  I. Rigoutsos,et al.  Geometric Hashing , 1997, IEEE Computational Science and Engineering.

[26]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[27]  I. Gelfand,et al.  Geometric invariant core for the V(L) and V(H) domains of immunoglobulin molecules. , 1998, Protein engineering.

[28]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[29]  C. Orengo CORA—Topological fingerprints for protein structural families , 2008, Protein science : a publication of the Protein Society.

[30]  Ruth Nussinov,et al.  Multiple Structural Alignment and Core Detection by Geometric Hashing , 1999, ISMB.

[31]  Tatsuya Akutsu,et al.  On the approximation of largest common subtrees and largest common point sets , 2000, Theor. Comput. Sci..

[32]  R Nussinov,et al.  Automated multiple structure alignment and detection of a common substructural motif , 2001, Proteins.

[33]  P. J. Plauger Hash tables , 1998 .