论文信息 - GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences.

GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences.

A new protein fold recognition method is described which is both fast and reliable. The method uses a traditional sequence alignment algorithm to generate alignments which are then evaluated by a method derived from threading techniques. As a final step, each threaded model is evaluated by a neural network in order to produce a single measure of confidence in the proposed prediction. The speed of the method, along with its sensitivity and very low false-positive rate makes it ideal for automatically predicting the structure of all the proteins in a translated bacterial genome (proteome). The method has been applied to the genome of Mycoplasma genitalium, and analysis of the results shows that as many as 46 % of the proteins derived from the predicted protein coding regions have a significant relationship to a protein of known structure. In some cases, however, only one domain of the protein can be predicted, giving a total coverage of 30 % when calculated as a fraction of the number of amino acid residues in the whole proteome.

David C. Jones | D. T. Jones

[1] G J Williams,et al. The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[2] W. Taylor,et al. Identification of protein sequence homology by consensus template alignment. , 1986, Journal of molecular biology.

[3] A. D. McLachlan,et al. Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[4] M. Gribskov,et al. [9] Profile analysis , 1990 .

[5] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.

[6] G. Casari,et al. Identification of native protein folds amongst a large number of incorrect models. The calculation of low energy conformations from potentials of mean force. , 1990, Journal of molecular biology.

[7] C. Sander,et al. Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[8] D. Eisenberg,et al. A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[9] P. Kraulis. A program to produce both detailed and schematic plots of protein structures , 1991 .

[10] S. Henikoff,et al. Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[11] D. T. Jones,et al. A new approach to protein fold recognition , 1992, Nature.

[12] A. Godzik,et al. Sequence-structure matching in globular proteins: application to supersecondary and tertiary structure determination. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[13] W R Taylor,et al. Fast structure alignment for protein databank searching , 1992, Proteins.

[14] G. Crippen,et al. Contact potential that recognizes the correct folding of globular proteins. , 1992, Journal of molecular biology.

[15] S. Bryant,et al. An empirical energy function for threading protein sequence through the folding motif , 1993, Proteins.

[16] Y. Matsuo,et al. Development of pseudoenergy potentials for assessing protein 3-D-1-D compatibility and detecting weak homologies. , 1993, Protein engineering.

[17] C Sander,et al. Prediction of protein structure by evaluation of sequence-structure fitness. Aligning sequences to contact profiles derived from three-dimensional structures. , 1993, Journal of molecular biology.

[18] S. Wodak,et al. Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches. , 1994, Journal of molecular biology.

[19] D. Haussler,et al. Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[20] R. Abagyan,et al. Recognition of distantly related proteins through energy calculations , 1994, Proteins.

[21] David T. Jones,et al. Protein superfamilles and domain superfolds , 1994, Nature.

[22] E S Lander,et al. Recognition of related proteins by iterative template refinement (ITR) , 1994, Protein science : a publication of the Protein Society.

[23] P. Bucher,et al. Improving the sensitivity of the sequence profile method , 1994, Protein science : a publication of the Protein Society.

[24] T K Attwood,et al. OWL--a non-redundant composite protein sequence database. , 1994, Nucleic acids research.

[25] R. Fleischmann,et al. The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[26] S. Bryant,et al. Threading a database of protein cores , 1995, Proteins.

[27] M J Sippl,et al. Progress in fold recognition , 1995, Proteins.

[28] J M Thornton,et al. LIGPLOT: a program to generate schematic diagrams of protein-ligand interactions. , 1995, Protein engineering.

[29] J M Thornton,et al. Successful protein fold recognition by optimal sequence threading validated by rigorous blind testing , 1995, Proteins.

[30] S. Wodak,et al. Protein structure prediction by threading methods: Evaluation of current techniques , 1995, Proteins.

[31] Masasuke Yoshida,et al. A common topology of proteins catalyzing ATP‐triggered reactions , 1995, FEBS letters.

[32] D. Fischer,et al. Protein fold recognition using sequence‐derived predictions , 1996, Protein science : a publication of the Protein Society.

[33] G. Barton,et al. Protein fold recognition by mapping predicted secondary structures. , 1996, Journal of molecular biology.

[34] F. Cohen,et al. Multiple sequence information for threading algorithms. , 1996, Journal of molecular biology.

[35] R. Jernigan,et al. Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. , 1996, Journal of molecular biology.

[36] David C. Jones,et al. Potential energy functions for threading. , 1996, Current opinion in structural biology.

[37] A Elofsson,et al. Assessing the performance of fold recognition methods by means of a comprehensive benchmark. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[38] Temple F. Smith,et al. Global optimum protein threading with gapped alignment and empirical pair score functions. , 1996, Journal of molecular biology.

[39] Thomas L. Madden,et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[40] D. Fischer,et al. Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[41] David C. Jones,et al. CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[42] J. Annereau,et al. A novel model for the first nucleotide binding domain of the cystic fibrosis transmembrane conductance regulator , 1997, FEBS letters.

[43] E S Huang,et al. Factors affecting the ability of energy functions to discriminate correct from incorrect folds. , 1997, Journal of molecular biology.

[44] R. Abagyan,et al. Do aligned sequences share the same fold? , 1997, Journal of molecular biology.

[45] David C. Jones,et al. Progress in protein structure prediction. , 1997, Current opinion in structural biology.

[46] L. H. Phylip,et al. Bacterial aspartic proteinases , 1997, FEBS letters.

[47] C. Chothia,et al. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[48] P Bork,et al. Homology-based fold predictions for Mycoplasma genitalium proteins. , 1998, Journal of molecular biology.

[49] T. N. Bhat,et al. The Protein Data Bank , 2000, Nucleic Acids Res..