Structural search and retrieval using a tableau representation of protein folding patterns

UNLABELLED Comparison and classification of folding patterns from a database of protein structures is crucial to understand the principles of protein architecture, evolution and function. Current search methods for proteins with similar folding patterns are slow and computationally intensive. The sharp growth in the number of known protein structures poses severe challenges for methods of structural comparison. There is a need for methods that can search the database of structures accurately and rapidly. We provide several methods to search for similar folding patterns using a concise tableau representation of proteins that encodes the relative geometry of secondary structural elements. Our first approach allows the extraction of identical and very closely-related protein folding patterns in constant-time (per hit). Next, we address the hard computational problem of extraction of maximally-similar subtableaux, when comparing two tableaux. We solve the problem using Quadratic and Linear integer programming formulations and demonstrate their power to identify subtle structural similarities, especially when protein structures significantly diverge. Finally, we describe a rapid and accurate method for comparing a query structure against a database of protein domains, TableauSearch. TableauSearch is rapid enough to search the entire structural database in seconds on a standard desktop computer. Our analysis of TableauSearch on many queries shows that the method is very accurate in identifying similarities of folding patterns, even between distantly related proteins. AVAILABILITY A web server implementing the TableauSearch is available from http://hollywood.bx.psu.edu/TabSearch.

[1]  Peter Willett,et al.  Similarity Searching in Databases of Three-Dimensional Molecules and Macromolecules. , 1993 .

[2]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[3]  Tom L. Blundell,et al.  Structure-based identification and clustering of protein families and superfamilies , 1994, J. Comput. Aided Mol. Des..

[4]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[5]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[6]  C. Orengo,et al.  A rapid method of protein structure alignment. , 1990, Journal of theoretical biology.

[7]  P Willett,et al.  Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm. , 1993, Journal of molecular biology.

[8]  Frances M. G. Pearl,et al.  Recognizing the fold of a protein structure , 2003, Bioinform..

[9]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[10]  Yi Zhong,et al.  Searching for three-dimensional secondary structural patterns in proteins with ProSMoS , 2007, Bioinform..

[11]  A. Lesk,et al.  How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. , 1980, Journal of molecular biology.

[12]  R. Abagyan,et al.  A simple qualitative representation of polypeptide chain folds: comparison of protein tertiary structures. , 1988, Journal of biomolecular structure & dynamics.

[13]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[14]  R. Bellman Dynamic programming. , 1957, Science.

[15]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[16]  Arthur M Lesk,et al.  Contact patterns between helices and strands of sheet define protein folding patterns , 2007, Proteins.

[17]  K. Mizuguchi,et al.  Comparison of spatial arrangements of secondary structural elements in proteins. , 1995, Protein engineering.

[18]  A. Konagurthu,et al.  MUSTANG: A multiple structural alignment algorithm , 2006, Proteins.

[19]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[20]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[21]  Ashley M Buckle,et al.  A Common Fold Mediates Vertebrate Defense and Bacterial Attack , 2007, Science.

[22]  David R. Gilbert,et al.  TOPS: an enhanced database of protein structural topology , 2004, Nucleic Acids Res..

[23]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[24]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[25]  A M Lesk,et al.  Systematic representation of protein folding patterns. , 1995, Journal of molecular graphics.

[26]  P Willett,et al.  Three‐dimensional structural resemblance between leucine aminopeptidase and carboxypeptidase A revealed by graph‐theoretical techniques , 1992, FEBS letters.

[27]  Ronald L. Rivest,et al.  The MD5 Message-Digest Algorithm , 1992, RFC.

[28]  Thomas Lengauer,et al.  An Algorithm for Finding Maximal Common Subtopologies in a Set of Protein Structures , 1996, J. Comput. Biol..