trilogy: Discovery of sequence–structure patterns across diverse proteins

We describe a new computer program, Trilogy, for the automated discovery of sequence-structure patterns in proteins. Trilogy implements a pattern discovery algorithm that begins with an exhaustive analysis of flexible three-residue patterns; a subset of these patterns are selected as seeds for an extension process in which longer patterns are identified. A key feature of the method is explicit treatment of both the sequence and structure components of these motifs: each Trilogy pattern is a pair consisting of a sequence pattern and a structure pattern. Matches to both these component patterns are identified independently, allowing the program to assign a significance score to each sequence-structure pattern that assesses the degree of correlation between the corresponding sequence and structure motifs. Trilogy identifies several thousand high-scoring patterns that occur across protein families. These include both previously identified and novel motifs. We expect that these sequence-structure patterns will be useful in predicting protein structure from sequence, annotating newly determined protein structures, and identifying novel motifs of potential functional or structural significance.

[1]  D. Baker,et al.  Recurring local sequence motifs in proteins. , 1995, Journal of molecular biology.

[2]  N. Grishin,et al.  Common fold in helix-hairpin-helix proteins. , 2000, Nucleic acids research.

[3]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[4]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[5]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[6]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[7]  A Wlodawer,et al.  Catalytic triads and their relatives. , 1998, Trends in biochemical sciences.

[8]  Alex Bateman,et al.  InterPro: An Integrated Documentation Resource for Protein Families, Domains and Functional Sites , 2002, Briefings Bioinform..

[9]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[10]  B. L. Sibanda,et al.  Beta-hairpin families in globular proteins. , 1985, Nature.

[11]  R. Russell,et al.  Detection of protein three-dimensional side-chain patterns: new examples of convergent evolution. , 1998, Journal of molecular biology.

[12]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[13]  P. Terpstra,et al.  Prediction of the occurrence of the ADP-binding beta alpha beta-fold in proteins, using an amino acid sequence fingerprint. , 1986, Journal of molecular biology.

[14]  F. Young Biochemistry , 1955, The Indian Medical Gazette.

[15]  J. Richardson,et al.  Amino acid preferences for specific locations at the ends of alpha helices. , 1988, Science.

[16]  G. Rose,et al.  Helix stop signals in proteins and peptides: the capping box. , 1993, Biochemistry.

[17]  C. Vonrhein,et al.  Structure of the 30S ribosomal subunit , 2000, Nature.

[18]  D. Baker,et al.  Prediction of local structure in proteins using a library of sequence-structure motifs. , 1998, Journal of molecular biology.

[19]  R A Sayle,et al.  RASMOL: biomolecular graphics for all. , 1995, Trends in biochemical sciences.

[20]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[21]  Chris P. Ponting,et al.  The helix-hairpin-helix DNA-binding motif: a structural basis for non- sequence-specific recognition of DNA , 1996, Nucleic Acids Res..

[22]  B. L. Sibanda,et al.  β-Hairpin families in globular proteins , 1985, Nature.

[23]  J. Thornton,et al.  A revised set of potentials for β‐turn formation in proteins , 1994 .

[24]  Sean R. Eddy,et al.  Pfam: multiple sequence alignments and HMM-profiles of protein domains , 1998, Nucleic Acids Res..

[25]  J. Thornton,et al.  A revised set of potentials for beta-turn formation in proteins. , 1994, Protein science : a publication of the Protein Society.

[26]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[27]  D. Parry Coiled-coils in α-helix-containing proteins: analysis of the residue types within the heptad repeat and the use of these data in the prediction of coiled-coils in other proteins , 1982, Bioscience reports.

[28]  I. Jonassen,et al.  Discovery of local packing motifs in protein structures , 1999, Proteins.

[29]  P. Terpstra,et al.  Prediction of the Occurrence of the ADP-binding βαβ-fold in Proteins, Using an Amino Acid Sequence Fingerprint , 1986 .

[30]  C Ouzounis,et al.  Dictionary building via unsupervised hierarchical motif discovery in the sequence space of natural proteins , 1999, Proteins.

[31]  P Bork,et al.  Recognition of different nucleotide-binding sites in primary structures using a property-pattern approach. , 1990, European journal of biochemistry.

[32]  J. Skolnick,et al.  Enhanced functional annotation of protein sequences via the use of structural descriptors. , 2001, Journal of structural biology.