Ballast: A Ball-Based Algorithm for Structural Motifs

Structural motifs encapsulate local sequence-structure-function relationships characteristic of related proteins, enabling the prediction of functional characteristics of new proteins, providing molecular-level insights into how those functions are performed, and supporting the development of variants specifically maintaining or perturbing function in concert with other properties. Numerous computational methods have been developed to search through databases of structures for instances of specified motifs. However, it remains an open problem how best to leverage the local geometric and chemical constraints underlying structural motifs in order to develop motif-finding algorithms that are both theoretically and practically efficient. We present a simple, general, efficient approach, called Ballast (ball-based algorithm for structural motifs), to match given structural motifs to given structures. Ballast combines the best properties of previously developed methods, exploiting the composition and local geometry of a structural motif and its possible instances in order to effectively filter candidate matches. We show that on a wide range of motif-matching problems, Ballast efficiently and effectively finds good matches, and we provide theoretical insights into why it works well. By supporting generic measures of compositional and geometric similarity, Ballast provides a powerful substrate for the development of motif-matching algorithms.

[1]  K. S. Arun,et al.  Least-Squares Fitting of Two 3-D Point Sets , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  George S. Lueker,et al.  A data structure for orthogonal range queries , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[3]  G. H. Reed,et al.  The enolase superfamily: a general strategy for enzyme-catalyzed abstraction of the alpha-protons of carboxylic acids. , 1996, Biochemistry.

[4]  Ashish V. Tendulkar,et al.  Functional sites in protein families uncovered via an objective and automated graph theoretic approach. , 2003, Journal of molecular biology.

[5]  C. Orengo,et al.  Protein function annotation by homology-based inference , 2009, Genome Biology.

[6]  László Lovász,et al.  Interactive proofs and the hardness of approximating cliques , 1996, JACM.

[7]  Eleanor J. Gardiner,et al.  Clique-detection algorithms for matching three-dimensional molecular structures. , 1997, Journal of molecular graphics & modelling.

[8]  Wei Wang,et al.  Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development , 2009, J. Comput. Aided Mol. Des..

[9]  Conrad C. Huang,et al.  Leveraging enzyme structure-function relationships for functional inference and experimental design: the structure-function linkage database. , 2006, Biochemistry.

[10]  G J Kleywegt,et al.  Recognition of spatial motifs in protein structures. , 1999, Journal of molecular biology.

[11]  P. Babbitt,et al.  Superfamily active site templates , 2004, Proteins.

[12]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[13]  Lydia E. Kavraki,et al.  The LabelHash algorithm for substructure matching , 2010, BMC Bioinformatics.

[14]  Wei Wang,et al.  Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: II. Case studies and applications , 2009, J. Comput. Aided Mol. Des..

[15]  M. Milik,et al.  Common Structural Cliques: a tool for protein structure and function analysis. , 2003, Protein engineering.

[16]  Lydia E. Kavraki,et al.  The MASH Pipeline for Protein Function Prediction and an Algorithm for the Geometric Refinement of 3D Motifs , 2007, J. Comput. Biol..

[17]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[18]  Janet M. Thornton,et al.  An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis , 2003, Bioinform..

[19]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[20]  M. Gerstein,et al.  The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. , 1999, Journal of molecular biology.

[21]  Jack Snoeyink,et al.  Almost-Delaunay simplices: nearest neighbor relations for imprecise points , 2004, SODA '04.

[22]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[23]  Rafael Najmanovich,et al.  Detection of 3 D atomic similarities and their use in the discrimination of small molecule protein-binding sites , 2008 .

[24]  Haim J. Wolfson,et al.  Geometric hashing: an overview , 1997 .

[25]  Lei Xie,et al.  Detecting evolutionary relationships across existing fold space, using sequence order-independent profile–profile alignments , 2008, Proceedings of the National Academy of Sciences.

[26]  Janet M. Thornton,et al.  Detection of 3D atomic similarities and their use in the discrimination of small molecule protein-binding sites , 2008, ECCB.

[27]  J. Thornton,et al.  Tess: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites , 1997, Protein science : a publication of the Protein Society.

[28]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[29]  S. Muthukrishnan,et al.  The bin-covering technique for thresholding random geometric graph properties , 2005, SODA '05.

[30]  H. Wolfson,et al.  Recognition of Functional Sites in Protein Structures☆ , 2004, Journal of Molecular Biology.

[31]  Dan E. Willard Predicate-Oriented Database Search Algorithms , 1978, Outstanding Dissertations in the Computer Sciences.

[32]  P. Willett,et al.  A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. , 1994, Journal of molecular biology.

[33]  J. Dall,et al.  Random geometric graphs. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[35]  H. Wolfson,et al.  Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Anne Jacobson Nanotechnology meets marine biology , 2002, Computing in Science & Engineering.