Evaluation of PSI‐BLAST alignment accuracy in comparison to structural alignments

The PSI‐BLAST algorithm has been acknowledged as one of the most powerful tools for detecting remote evolutionary relationships by sequence considerations only. This has been demonstrated by its ability to recognize remote structural homologues and by the greatest coverage it enables in annotation of a complete genome. Although recognizing the correct fold of a sequence is of major importance, the accuracy of the alignment is crucial for the success of modeling one sequence by the structure of its remote homologue. Here we assess the accuracy of PSI‐BLAST alignments on a stringent database of 123 structurally similar, sequence‐dissimilar pairs of proteins, by comparing them to the alignments defined on a structural basis. Each protein sequence is compared to a nonredundant database of the protein sequences by PSI‐BLAST. Whenever a pair member detects its pair‐mate, the positions that are aligned both in the sequential and structural alignments are determined, and the alignment sensitivity is expressed as the per‐centage of these positions out of the structural alignment. Fifty‐two sequences detected their pair‐mates (for 16 pairs the success was bi‐directional when either pair member was used as a query). The average percentage of correctly aligned residues per structural alignment was 43.5 ± 2.2%. Other properties of the alignments were also examined, such as the sensitivity vs. specificity and the change in these parameters over consecutive iterations. Notably, there is an improvement in alignment sensitivity over consecutive iterations, reaching an average of 50.9 + 2.5% within the five iterations tested in the current study.

[1]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[2]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[3]  M. Sippl,et al.  Detection of native‐like models for amino acid sequences of unknown three‐dimensional structure in a data base of known protein conformations , 1992, Proteins.

[4]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[5]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[6]  W R Taylor,et al.  Fast structure alignment for protein databank searching , 1992, Proteins.

[7]  C. Chothia Proteins. One thousand families for the molecular biologist. , 1992, Nature.

[8]  S. Bryant,et al.  An empirical energy function for threading protein sequence through the folding motif , 1993, Proteins.

[9]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[10]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[11]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[12]  David T. Jones,et al.  Protein superfamilles and domain superfolds , 1994, Nature.

[13]  R. L. Jernigan,et al.  A NEW APPROACH TO PROTEIN FOLDING CALCULATIONS , 1994 .

[14]  Janet M. Thornton,et al.  Protein domain superfolds and superfamilies , 1994 .

[15]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[16]  D. Fischer,et al.  Protein fold recognition using sequence‐derived predictions , 1996, Protein science : a publication of the Protein Society.

[17]  A. Godzik The structural alignment between two proteins: Is there a unique answer? , 1996, Protein science : a publication of the Protein Society.

[18]  B. Rost,et al.  Protein fold recognition by prediction-based threading. , 1997, Journal of molecular biology.

[19]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[20]  S H Bryant,et al.  Measures of threading specificity and accuracy , 1997, Proteins.

[21]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[22]  M J Sternberg,et al.  Recognition of analogous and homologous protein folds--assessment of prediction success and associated alignment accuracy using empirical substitution matrices. , 1998, Protein engineering.

[23]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.

[24]  Sung-Hou Kim Shining a light on structural genomics , 1998, Nature Structural Biology.

[25]  C A Orengo,et al.  Genome analysis: Assigning protein coding regions to three‐dimensional structures , 1999 .

[26]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[27]  M J Sternberg,et al.  Progress in protein structure prediction: assessment of CASP3. , 1999, Current opinion in structural biology.

[28]  S H Bryant,et al.  A measure of progress in fold recognition? , 1999, Proteins.

[29]  M. Sternberg,et al.  Benchmarking PSI-BLAST in genome annotation. , 1999, Journal of molecular biology.

[30]  M Gerstein,et al.  Advances in structural genomics. , 1999, Current opinion in structural biology.

[31]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[32]  J. M. Sauder,et al.  Large‐scale comparison of protein sequence alignment algorithms with structure alignments , 2000, Proteins.

[33]  A. Panchenko,et al.  Combination of threading potentials and sequence profiles improves fold recognition. , 2000, Journal of molecular biology.

[34]  W A Koppensteiner,et al.  Characterization of novel proteins based on known protein structures. , 2000, Journal of molecular biology.

[35]  Sarah A. Teichmann,et al.  Fast assignment of protein structures to sequences using the Intermediate Sequence Library PDB-ISL , 2000, Bioinform..

[36]  M J Sippl,et al.  Structure-based evaluation of sequence comparison and fold recognition alignment accuracy. , 2000, Journal of molecular biology.

[37]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[38]  S. Brenner,et al.  Expectations from structural genomics , 2008, Protein science : a publication of the Protein Society.