Enhanced functional annotation of protein sequences via the use of structural descriptors.

In order to circumvent limitations of sequence based methods in the process of making functional predictions for proteins, we have developed a methodology that uses a sequence-to-structure-to-function paradigm. First, an approximate three-dimensional structure is predicted. Then, a three-dimensional descriptor of the functional site, termed a Fuzzy Functional Form, or FFF, is used to screen the structure for the presence of the functional site of interest (Fetrow et al., 1998; Fetrow and Skolnick, 1998). Previously, a disulfide oxidoreductase FFF was developed and applied to predicted structures obtained from a small structural database. Here, using a substantially larger structural database, we expand the analysis of the disulfide oxidoreductase FFF to the B. subtilis genome. To ascertain the performance of the FFF, its results are compared to those obtained using both the sequence alignment method BLAST and three local sequence motif databases: PRINTS, Prosite, and Blocks. The FFF method is then compared in detail to Blocks and it is shown that the FFF is more flexible and sensitive in finding a specific function in a set of unknown proteins. In addition, the estimated false positive rate of function prediction is significantly lower using the FFF structural motif, rather than the standard sequence motif methods. We also present a second FFF and describe a specific example of the results of its whole-genome application to D. melanogaster using a newer threading algorithm. Our results from all of these studies indicate that the addition of three-dimensional structural information adds significant value in the prediction of biochemical function of genomic sequences.

[1]  H. Eklund,et al.  Crystal structure of thioredoxin from Escherichia coli at 1.68 A resolution. , 1990, Journal of molecular biology.

[2]  NMR structure of oxidized Escherichia coli glutaredoxin: Comparison with reduced E. coli glutaredoxin and functionally related proteins , 1992, Protein science : a publication of the Protein Society.

[3]  H. Eklund,et al.  Structure of oxidized bacteriophage T4 glutaredoxin (thioredoxin). Refinement of native and mutant proteins. , 1992, Journal of molecular biology.

[4]  John Kuriyan,et al.  Crystal structure of the DsbA protein required for disulphide bond formation in vivo , 1993, Nature.

[5]  D. Barford,et al.  Crystal structure of human protein tyrosine phosphatase 1B. , 1994, Science.

[6]  P. Willett,et al.  A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. , 1994, Journal of molecular biology.

[7]  S. Henikoff,et al.  Protein family classification based on searching a database of blocks. , 1994, Genomics.

[8]  R. Nussinov,et al.  Three‐dimensional, sequence order‐independent structural comparison of a serine protease against the crystallographic database reveals active site similarities: Potential implications to evolution and to protein folding , 1994, Protein science : a publication of the Protein Society.

[9]  Yanfeng Yang,et al.  Crystal structure of thioltransferase at 2.2 Å resolution , 1995, Protein science : a publication of the Protein Society.

[10]  Burkhard Rost,et al.  TOPITS: Threading One-Dimensional Predictions Into Three-Dimensional Structures , 1995, ISMB.

[11]  H. Eklund,et al.  Crystal structure of thioredoxin-2 from Anabaena. , 1995, Structure.

[12]  Jack E. Dixon,et al.  Crystal Structure of the Dual Specificity Protein Phosphatase VHR , 1996, Science.

[13]  S. Bryant Evaluation of threading specificity and accuracy , 1996, Proteins.

[14]  G. Powis,et al.  Crystal structures of reduced, oxidized, and mutated human thioredoxins: evidence for a regulatory homodimer. , 1996, Structure.

[15]  J R Gunn,et al.  Computational studies of protein folding. , 1996, Annual review of biophysics and biomolecular structure.

[16]  R. Glockshuber,et al.  Structural analysis of three His32 mutants of DsbA: Support for an electrostatic role of His32 in DsbA stability , 1997, Protein science : a publication of the Protein Society.

[17]  J. Skolnick,et al.  MONSSTER: a method for folding globular proteins with a small number of distance restraints. , 1997, Journal of molecular biology.

[18]  J. Thornton,et al.  Tess: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites , 1997, Protein science : a publication of the Protein Society.

[19]  M J Sippl,et al.  Protein folds from pair interactions: A blind test in fold recognition , 1997, Proteins.

[20]  R. Taylor,et al.  Structure of TcpG, the DsbA protein folding catalyst from Vibrio cholerae. , 1997, Journal of molecular biology.

[21]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[22]  Y. Kasahara,et al.  Characterization of an lrp-like (IrpC ) gene from Bacillus subtilis , 1997, Molecular and General Genetics MGG.

[23]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[24]  M. Blackledge,et al.  NMR solution structure of an oxidised thioredoxin h from the eukaryotic green alga Chlamydomonas reinhardtii. , 1997, European journal of biochemistry.

[25]  C. Chothia,et al.  Population statistics of protein structures: lessons from structural classifications. , 1997, Current opinion in structural biology.

[26]  J Skolnick,et al.  Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm: identification of proteins exhibiting the glutaredoxin/thioredoxin disulfide oxidoreductase activity. , 1998, Journal of molecular biology.

[27]  Leszek Rychlewski,et al.  Fold prediction by a hierarchy of sequence, threading, and modeling methods , 1998, Protein science : a publication of the Protein Society.

[28]  Y. Zhao,et al.  Molecular Basis for Substrate Specificity of Protein-tyrosine Phosphatase 1B* , 1998, The Journal of Biological Chemistry.

[29]  Sean R. Eddy,et al.  Pfam: multiple sequence alignments and HMM-profiles of protein domains , 1998, Nucleic Acids Res..

[30]  A Tramontano,et al.  Homology modeling with low sequence identity. , 1998, Methods.

[31]  A. Sali,et al.  Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[32]  J Skolnick,et al.  Functional analysis of the Escherichia coli genome for members of the alpha/beta hydrolase family. , 1998, Folding & design.

[33]  Jacquelyn S. Fetrow,et al.  Functional analysis of the Escherichia coli genome for members of the α /β hydrolase family , 1998 .

[34]  J. Skolnick,et al.  Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. , 1998, Journal of molecular biology.

[35]  Marie Zhang,et al.  Crystal Structure of a Human Low Molecular Weight Phosphotyrosyl Phosphatase , 1998, The Journal of Biological Chemistry.

[36]  P D Karp,et al.  What we do not know about sequence analysis and sequence databases. , 1998, Bioinformatics.

[37]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[38]  P. Bork,et al.  Predicting functions from protein sequences—where are the bottlenecks? , 1998, Nature Genetics.

[39]  Terri K. Attwood,et al.  The PRINTS protein fingerprint database in its fifth year , 1998, Nucleic Acids Res..

[40]  S F Altschul,et al.  Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. , 1998, Trends in biochemical sciences.

[41]  Miguel A. Andrade-Navarro,et al.  Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families , 1998, Bioinform..

[42]  J. Galán,et al.  Identification of a Specific Chaperone for SptP, a Substrate of the Centisome 63 Type III Secretion System ofSalmonella typhimurium , 1998, Journal of bacteriology.

[43]  Thomas L. Madden,et al.  Protein sequence similarity searches using patterns as seeds. , 1998, Nucleic acids research.

[44]  George D. Rose,et al.  Identifying two ancient enzymes in Archaea using predicted secondary structure alignment , 1999, Nature Structural Biology.

[45]  Peer Bork,et al.  Evaluation of human-readable annotation in biomolecular sequence databases with biological rule libraries , 1999, Bioinform..

[46]  Robert D. Finn,et al.  Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins , 1999, Nucleic Acids Res..

[47]  D. T. Jones,et al.  Successful recognition of protein folds using threading methods biased by sequence similarity and predicted secondary structure , 1999, Proteins.

[48]  Miguel A. Andrade-Navarro,et al.  Automated genome sequence analysis and annotation , 1999, Bioinform..

[49]  Lawrence Hunter,et al.  Mining molecular binding terminology from biomedical text , 1999, AMIA.

[50]  D. Fischer Modeling three‐dimensional protein structures for amino acid sequences of the CASP3 experiment using sequence‐derived predictions , 1999, Proteins.

[51]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[52]  Temple F. Smith,et al.  The WD repeat: a common architecture for diverse functions. , 1999, Trends in biochemical sciences.

[53]  P Rotkiewicz,et al.  A method for the improvement of threading‐based protein models , 1999, Proteins.

[54]  A. Murzin Structure classification‐based assessment of CASP3 predictions for the fold recognition targets , 1999, Proteins.

[55]  J Moult,et al.  Predicting protein three-dimensional structure. , 1999, Current opinion in biotechnology.

[56]  H. Misawa,et al.  Intracellular signaling factors--enhanced hepatic nuclear protein binding to TTGGC sequence in the rat regucalcin gene promoter: involvement of protein phosphorylation. , 2000, Biochemical and biophysical research communications.

[57]  C. Ouzounis,et al.  Automatic extraction of protein interactions from scientific abstracts. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[58]  W A Koppensteiner,et al.  Characterization of novel proteins based on known protein structures. , 2000, Journal of molecular biology.

[59]  I. Hoffmann,et al.  Cell cycle regulation by the Cdc25 phosphatase family. , 2000, Progress in cell cycle research.

[60]  R C Wade,et al.  Nuclear receptor-DNA binding specificity: A COMBINE and Free-Wilson QSAR analysis. , 2000, Journal of medicinal chemistry.

[61]  H. Misawa,et al.  Translocation of regucalcin to rat liver nucleus: involvement of nuclear protein kinase and protein phosphatase regulation. , 2000, International journal of molecular medicine.

[62]  D. Osguthorpe Ab initio protein folding. , 2000, Current opinion in structural biology.

[63]  Graziano Pesole,et al.  PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance , 2000, Bioinform..

[64]  T K Attwood,et al.  The quest to deduce protein function from sequence: the role of pattern databases. , 2000, The international journal of biochemistry & cell biology.

[65]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[66]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[67]  J. Bliska,et al.  Identification of Residues in the N-terminal Domain of theYersinia Tyrosine Phosphatase That Are Critical for Substrate Recognition* , 2001, The Journal of Biological Chemistry.

[68]  J Skolnick,et al.  Defrosting the frozen approximation: PROSPECTOR— A new approach to threading , 2001, Proteins.

[69]  Andrzej Kolinski,et al.  Computational studies of protein folding , 2001, Comput. Sci. Eng..

[70]  Amos Bairoch,et al.  The PROSITE database, its status in 2002 , 2002, Nucleic Acids Res..