FAUST: An Algorithm for Extracting Functionally Relevant Templates from Protein Structures

FAUST (Functional Annotations Using Structural Templates) is an algorithm for: extraction of functionally relevant templates from protein structures and using such templates to annotate novel structures. Proteins and structural templates are represented as colored, undirected graphs with atoms as nodes and interatomic distances as edge weights. Node colors are based on chemical identities of atoms. Edge labels are equivalent if interatomic distances for corresponding nodes (atoms) differ less than a threshold value. We define FAUST structural template as a common subgraph of a set of graphs corresponding to two or more functionally related proteins. Pairs of functionally related protein structures are searched for sets of chemically equivalent atoms whose interatomic distances are conserved in both structures. Structural templates resulting from such pair wise searches are then combined to maximize classification performance on a training set of irredundant protein structures. The resulting structural template provides new language for description of structure--function relationship in proteins. These templates are used for active and binding site identification in protein structures. We are demonstrating here structural template extraction results for the highly divergent family of serine proteases. We compare FAUST templates to the standard description of the serine proteases active site pattern conservation and demonstrate depth of information captured in such description. Also, we present preliminary results of the high-throughput protein structure database annotations with a comprehensive library of FAUST templates.

[1]  R. Russell,et al.  Detection of protein three-dimensional side-chain patterns: new examples of convergent evolution. , 1998, Journal of molecular biology.

[2]  R. Nussinov,et al.  Three‐dimensional, sequence order‐independent structural comparison of a serine protease against the crystallographic database reveals active site similarities: Potential implications to evolution and to protein folding , 1994, Protein science : a publication of the Protein Society.

[3]  J Skolnick,et al.  Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm: identification of proteins exhibiting the glutaredoxin/thioredoxin disulfide oxidoreductase activity. , 1998, Journal of molecular biology.

[4]  Amos Bairoch,et al.  The ENZYME database in 2000 , 2000, Nucleic Acids Res..

[5]  M. James,et al.  Rat submaxillary gland serine protease, tonin. Structure solution and refinement at 1.8 A resolution. , 1987, Journal of molecular biology.

[6]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[7]  J. Skolnick,et al.  Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. , 1998, Journal of molecular biology.

[8]  P. Willett,et al.  A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. , 1994, Journal of molecular biology.

[9]  David J. Edwards,et al.  Functional annotation of proteomic sequences based on consensus of sequence and structural analysis , 2002, Briefings Bioinform..

[10]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[11]  J. Thornton,et al.  Tess: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites , 1997, Protein science : a publication of the Protein Society.

[12]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[13]  N N Alexandrov,et al.  Biological meaning, statistical significance, and classification of local spatial similarities in nonhomologous proteins , 1994, Protein science : a publication of the Protein Society.