Enhanced genome annotation using structural profiles in the program 3D-PSSM.

A method (three-dimensional position-specific scoring matrix, 3D-PSSM) to recognise remote protein sequence homologues is described. The method combines the power of multiple sequence profiles with knowledge of protein structure to provide enhanced recognition and thus functional assignment of newly sequenced genomes. The method uses structural alignments of homologous proteins of similar three-dimensional structure in the structural classification of proteins (SCOP) database to obtain a structural equivalence of residues. These equivalences are used to extend multiply aligned sequences obtained by standard sequence searches. The resulting large superfamily-based multiple alignment is converted into a PSSM. Combined with secondary structure matching and solvation potentials, 3D-PSSM can recognise structural and functional relationships beyond state-of-the-art sequence methods. In a cross-validated benchmark on 136 homologous relationships unambiguously undetectable by position-specific iterated basic local alignment search tool (PSI-Blast), 3D-PSSM can confidently assign 18 %. The method was applied to the remaining unassigned regions of the Mycoplasma genitalium genome and an additional 13 regions were assigned with 95 % confidence. 3D-PSSM is available to the community as a web server: http://www.bmm.icnet.uk/servers/3dpssm

[1]  M. Greenwood An Introduction to Medical Statistics , 1932, Nature.

[2]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[3]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[4]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[5]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[6]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[7]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[8]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[9]  K Morikawa,et al.  Structural details of ribonuclease H from Escherichia coli as refined to an atomic resolution. , 1992, Journal of molecular biology.

[10]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[11]  W R Taylor,et al.  Fast structure alignment for protein databank searching , 1992, Proteins.

[12]  A. Godzik,et al.  Topology fingerprint approach to the inverse protein folding problem. , 1992, Journal of molecular biology.

[13]  John P. Overington,et al.  Alignment and searching for common protein folds using a data bank of structural templates. , 1993, Journal of molecular biology.

[14]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[15]  R. Kaptein,et al.  Solution structure of the LexA repressor DNA binding domain determined by 1H NMR spectroscopy. , 1994, The EMBO journal.

[16]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[17]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[18]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[19]  T. Steitz,et al.  Recombining the structures of HIV integrase, RuvC and RNase H. , 1995, Structure.

[20]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[21]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[22]  S. Bryant Evaluation of threading specificity and accuracy , 1996, Proteins.

[23]  R. King,et al.  Identification and application of the concepts important for accurate and reliable protein secondary structure prediction , 1996, Protein science : a publication of the Protein Society.

[24]  D. Fischer,et al.  A study of combined structure/sequence profiles. , 1996, Folding & design.

[25]  D. Fischer,et al.  Protein fold recognition using sequence‐derived predictions , 1996, Protein science : a publication of the Protein Society.

[26]  G. Barton,et al.  Protein fold recognition by mapping predicted secondary structures. , 1996, Journal of molecular biology.

[27]  F. Cohen,et al.  Multiple sequence information for threading algorithms. , 1996, Journal of molecular biology.

[28]  L. Kelley,et al.  An automated approach for clustering an ensemble of NMR-derived protein structures into conformationally related subfamilies. , 1996, Protein engineering.

[29]  M. Jaskólski,et al.  The catalytic domain of avian sarcoma virus integrase: conformation of the active-site residues in the presence of divalent cations. , 1996, Structure.

[30]  D. Fischer,et al.  Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[31]  B. Rost,et al.  Protein fold recognition by prediction-based threading. , 1997, Journal of molecular biology.

[32]  David C. Jones,et al.  Progress in protein structure prediction. , 1997, Current opinion in structural biology.

[33]  C. Gualerzi,et al.  The structure of the translational initiation factor IF1 from E.coli contains an oligomer‐binding motif , 1997, The EMBO journal.

[34]  Ralf Zimmer,et al.  New scoring Schemes for Protein fold recognition based on Voronoi contacts , 1997, German Conference on Bioinformatics.

[35]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[36]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[37]  M. Eren,et al.  Control of oxidation-reduction potentials in flavodoxin from Clostridium beijerinckii: the role of conformation changes. , 1997, Biochemistry.

[38]  M Levitt,et al.  Competitive assessment of protein fold recognition and alignment accuracy , 1997, Proteins.

[39]  J L Sussman,et al.  Protein Data Bank archives of three-dimensional macromolecular structures. , 1997, Methods in enzymology.

[40]  Mark Proctor,et al.  The Solution Structure of the S1 RNA Binding Domain: A Member of an Ancient Nucleic Acid–Binding Fold , 1997, Cell.

[41]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1998 , 1998, Nucleic Acids Res..

[42]  A. Godzik,et al.  Fold and function predictions for Mycoplasma genitalium proteins. , 1998, Folding & design.

[43]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[44]  M J Sternberg,et al.  Recognition of analogous and homologous protein folds--assessment of prediction success and associated alignment accuracy using empirical substitution matrices. , 1998, Protein engineering.

[45]  Chris Sander,et al.  Touring protein fold space with Dali/FSSP , 1998, Nucleic Acids Res..

[46]  C. Chothia,et al.  Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[47]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.

[48]  M J Sternberg,et al.  Supersites within superfolds. Binding site similarity in the absence of homology. , 1998, Journal of molecular biology.

[49]  W. C. Barker,et al.  The PIR-International Protein Sequence Database. , 1998, Nucleic acids research.

[50]  D. T. Jones,et al.  Successful recognition of protein folds using threading methods biased by sequence similarity and predicted secondary structure , 1999, Proteins.

[51]  R Thiele,et al.  Protein threading by recursive dynamic programming. , 1999, Journal of molecular biology.

[52]  L. Mirny,et al.  Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. , 1999, Journal of molecular biology.

[53]  R. Copley,et al.  Fold recognition using sequence and secondary structure information , 1999, Proteins.

[54]  A. Panchenko,et al.  Threading with explicit models for evolutionary conservation of structure and sequence , 1999, Proteins.

[55]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[56]  M. Sternberg,et al.  Benchmarking PSI-BLAST in genome annotation. , 1999, Journal of molecular biology.

[57]  M Gerstein,et al.  Advances in structural genomics. , 1999, Current opinion in structural biology.

[58]  M. Gerstein,et al.  The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. , 1999, Journal of molecular biology.

[59]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[60]  D Fischer,et al.  CAFASP‐1: Critical assessment of fully automated structure prediction methods , 1999, Proteins.

[61]  A. Godzik,et al.  Functional insights from structural predictions: Analysis of the Escherichia coli genome , 2008, Protein science : a publication of the Protein Society.

[62]  A. Murzin Structure classification‐based assessment of CASP3 predictions for the fold recognition targets , 1999, Proteins.

[63]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..