Motif‐based fold assignment

Conventional fold recognition techniques rely mainly on the analysis of the entire sequence of a protein. We present an MBA method to improve performance of any conventional sequence‐based fold assignment. The method uses sequence motifs, such as those defined in the Prosite database, and the SwissProt annotation of the fold library. When combined with a simple SDP method, the coverage of MBA is comparable to the results obtained with PSI‐BLAST. However, the set of the MBA predictions is significantly different from that of PSI‐BLAST, leading to a 40% increase of the coverage for the combined MBA/PSI‐BLAST method. The MBA approach can be easily adopted to include the results of sequence‐independent function prediction methods and alternative motif and annotation databases. The method is available through the web server localized at http://www.doe‐mbi.ucla.edu/mba.

[1]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[2]  D. Fischer,et al.  Protein fold recognition using sequence‐derived predictions , 1996, Protein science : a publication of the Protein Society.

[3]  J. Skolnick,et al.  Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. , 1998, Journal of molecular biology.

[4]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[5]  Michael J. E. Sternberg,et al.  SAWTED: Structure Assignment With Text Description-Enhanced detection of remote homologues with automated SWISS-PROT annotation comparisons , 2000, Bioinform..

[6]  D. Brutlag,et al.  Highly specific protein sequence motifs for genome analysis. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[7]  M. Sternberg,et al.  Benchmarking PSI-BLAST in genome annotation. , 1999, Journal of molecular biology.

[8]  A. Valencia,et al.  Practical limits of function prediction , 2000, Proteins.

[9]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[10]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[11]  W A Koppensteiner,et al.  The role of protein structure in genomics , 2000, FEBS letters.

[12]  Miguel A. Andrade-Navarro,et al.  Automated genome sequence analysis and annotation , 1999, Bioinform..

[13]  G. Barton,et al.  Protein fold recognition by mapping predicted secondary structures. , 1996, Journal of molecular biology.

[14]  T F Smith,et al.  The art of matchmaking: sequence alignment methods and their structural implications. , 1999, Structure.

[15]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[16]  A G Murzin,et al.  Distant homology recognition using structural classification of proteins , 1997, Proteins.

[17]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[18]  Miguel A. Andrade-Navarro,et al.  Sequence analysis of the Methanococcus jannaschii genome and the prediction of protein function , 1997, Comput. Appl. Biosci..

[19]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[20]  J. Skolnick,et al.  From genes to protein structure and function: novel applications of computational approaches in the genomic era. , 2000, Trends in biotechnology.

[21]  D. T. Jones,et al.  Successful recognition of protein folds using threading methods biased by sequence similarity and predicted secondary structure , 1999, Proteins.

[22]  Andreas D. Baxevanis,et al.  The Molecular Biology Database Collection: an online compilation of relevant database resources , 2000, Nucleic Acids Res..

[23]  Temple F. Smith,et al.  Protein fold recognition by total alignment probability , 2000, Proteins.

[24]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[25]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[26]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[27]  Jacquelyn S. Fetrow,et al.  Structural genomics and its importance for gene function analysis , 2000, Nature Biotechnology.

[28]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[29]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.

[30]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[31]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[32]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.

[33]  D Eisenberg,et al.  A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. , 1997, Journal of molecular biology.

[34]  J M Thornton,et al.  Three-dimensional structure analysis of PROSITE patterns. , 1999, Journal of molecular biology.

[35]  E. Marcotte,et al.  Computational genetics: finding protein function by nonhomology methods. , 2000, Current opinion in structural biology.

[36]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[37]  James E. Bray,et al.  The CATH Database provides insights into protein structure/function relationships , 1999, Nucleic Acids Res..

[38]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[39]  L Rychlewski,et al.  From fold predictions to function predictions: Automation of functional site conservation analysis for functional genome predictions , 1999, Protein science : a publication of the Protein Society.

[40]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[41]  P. Bork,et al.  Protein sequence motifs. , 1996, Current opinion in structural biology.

[42]  A. Panchenko,et al.  Combination of threading potentials and sequence profiles improves fold recognition. , 2000, Journal of molecular biology.

[43]  M J Sippl,et al.  Structure-based evaluation of sequence comparison and fold recognition alignment accuracy. , 2000, Journal of molecular biology.

[44]  M. Gerstein,et al.  The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. , 1999, Journal of molecular biology.

[45]  C Ouzounis,et al.  Dictionary building via unsupervised hierarchical motif discovery in the sequence space of natural proteins , 1999, Proteins.

[46]  David T. Jones,et al.  Protein superfamilles and domain superfolds , 1994, Nature.