Automated multiple structure alignment and detection of a common substructural motif

While a number of approaches have been geared toward multiple sequence alignments, to date there have been very few approaches to multiple structure alignment and detection of a recurring substructural motif. Among these, none performs both multiple structure comparison and motif detection simultaneously. Further, none considers all structures at the same time, rather than initiating from pairwise molecular comparisons. We present such a multiple structural alignment algorithm. Given an ensemble of protein structures, the algorithm automatically finds the largest common substructure (core) of Cα atoms that appears in all the molecules in the ensemble. The detection of the core and the structural alignment are done simultaneously. Additional structural alignments also are obtained and are ranked by the sizes of the substructural motifs, which are present in the entire ensemble. The method is based on the geometric hashing paradigm. As in our previous structural comparison algorithms, it compares the structures in an amino acid sequence order‐independent way, and hence the resulting alignment is unaffected by insertions, deletions and protein chain directionality. As such, it can be applied to protein surfaces, protein–protein interfaces and protein cores to find the optimally, and suboptimally spatially recurring substructural motifs. There is no predefinition of the motif. We describe the algorithm, demonstrating its efficiency. In particular, we present a range of results for several protein ensembles, with different folds and belonging to the same, or to different, families. Since the algorithm treats molecules as collections of points in three‐dimensional space, it can also be applied to other molecules, such as RNA, or drugs. Proteins 2001;43:235–245. © 2001 Wiley‐Liss, Inc.

[1]  Ruth Nussinov,et al.  Multiple Structural Alignment and Core Detection by Geometric Hashing , 1999, ISMB.

[2]  K. S. Arun,et al.  Least-Squares Fitting of Two 3-D Point Sets , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  R. Nussinov,et al.  A 3D sequence-independent representation of the protein data bank. , 1995, Protein engineering.

[4]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[5]  D Fischer,et al.  Molecular surface representations by sparse critical points , 1994, Proteins.

[6]  R. Nussinov,et al.  Conservation of polar residues as hot spots at protein interfaces , 2000, Proteins.

[7]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[8]  I. Gelfand,et al.  Geometric invariant core for the V(L) and V(H) domains of immunoglobulin molecules. , 1998, Protein engineering.

[9]  W R Taylor,et al.  SSAP: sequential structure alignment program for protein structure comparison. , 1996, Methods in enzymology.

[10]  Yehezkel Lamdan,et al.  Geometric Hashing: A General And Efficient Model-based Recognition Scheme , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[11]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[12]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[13]  H. Wolfson,et al.  Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[14]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[15]  H. Wolfson,et al.  A dataset of protein-protein interfaces generated with a sequence-order-independent comparison technique. , 1996, Journal of molecular biology.

[16]  Mark Gerstein,et al.  Using Iterative Dynamic Programming to Obtain Accurate Pairwise and Multiple Alignments of Protein Structures , 1996, ISMB.

[17]  R. Nussinov,et al.  Molecular recognition via face center representation of a molecular surface. , 1996, Journal of Molecular Graphics.