Improving the quality of twilight‐zone alignments

Several recent publications illustrated advantages of using sequence profiles in recognizing distant homologies between proteins. At the same time, the practical usefulness of distant homology recognition depends not only on the sensitivity of the algorithm, but also on the quality of the alignment between a prediction target and the template from the database of known proteins. Here, we study this question for several supersensitive protein algorithms that were previously compared in their recognition sensitivity (Rychlewski et al., 2000). A database of protein pairs with similar structures, but low sequence similarity is used to rate the alignments obtained with several different methods, which included sequence–sequence, sequence–profile, and profile–profile alignment methods. We show that incorporation of evolutionary information encoded in sequence profiles into alignment calculation methods significantly increases the alignment accuracy, bringing them closer to the alignments obtained from structure comparison.

[1]  Leszek Rychlewski,et al.  Fold prediction by a hierarchy of sequence, threading, and modeling methods , 1998, Protein science : a publication of the Protein Society.

[2]  C Sander,et al.  Dictionary of recurrent domains in protein structures , 1998, Proteins.

[3]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[4]  B Rost,et al.  Bridging the protein sequence-structure gap by structure predictions. , 1996, Annual review of biophysics and biomolecular structure.

[5]  W A Koppensteiner,et al.  Sustained performance of knowledge‐based potentials in fold recognition , 1999, Proteins.

[6]  A. Elofsson,et al.  Hidden Markov models that use predicted secondary structures for fold recognition , 1999, Proteins.

[7]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[8]  A Godzik,et al.  Structural diversity in a family of homologous proteins. , 1996, Journal of molecular biology.

[9]  T. Alwyn Jones,et al.  CASP3 comparative modeling evaluation , 1999, Proteins.

[10]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[11]  J Moult,et al.  Predicting protein three-dimensional structure. , 1999, Current opinion in biotechnology.

[12]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[13]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[14]  A. Sali,et al.  Structural genomics: beyond the Human Genome Project , 1999, Nature Genetics.

[15]  A. Godzik,et al.  Topology fingerprint approach to the inverse protein folding problem. , 1992, Journal of molecular biology.

[16]  P. Argos,et al.  An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. , 1995, Journal of molecular biology.

[17]  M. Vingron,et al.  Quantifying the local reliability of a sequence alignment. , 1996, Protein engineering.

[18]  Lihua Yu,et al.  Positional Statistical Significance in Sequence Alignment , 1999, J. Comput. Biol..

[19]  S. Styring,et al.  Structure of donor side components in photosystem II predicted by computer modelling. , 1990, The EMBO journal.

[20]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[21]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[22]  A. Sali,et al.  Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[23]  M. Karplus,et al.  Evaluation of comparative protein modeling by MODELLER , 1995, Proteins.

[24]  D. Fischer Modeling three‐dimensional protein structures for amino acid sequences of the CASP3 experiment using sequence‐derived predictions , 1999, Proteins.

[25]  A. Godzik,et al.  Functional insights from structural predictions: Analysis of the Escherichia coli genome , 2008, Protein science : a publication of the Protein Society.

[26]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[27]  A. Panchenko,et al.  Threading with explicit models for evolutionary conservation of structure and sequence , 1999, Proteins.

[28]  A. Godzik,et al.  Regularities in interaction patterns of globular proteins. , 1993, Protein engineering.

[29]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[30]  A. Godzik The structural alignment between two proteins: Is there a unique answer? , 1996, Protein science : a publication of the Protein Society.

[31]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[32]  A. Murzin Structure classification‐based assessment of CASP3 predictions for the fold recognition targets , 1999, Proteins.

[33]  T J Hubbard RMS/Coverage graphs: A qualitative method for comparing three‐dimensional protein structure predictions , 1999, Proteins.

[34]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[35]  K Karplus,et al.  Predicting protein structure using only sequence information , 1999, Proteins.

[36]  A Godzik,et al.  Multiple model approach--dealing with alignment ambiguities in protein modeling. , 1997, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[37]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.

[38]  M. Zuker Suboptimal sequence alignment in molecular biology. Alignment with error analysis. , 1991, Journal of molecular biology.

[39]  W. Kabsch A discussion of the solution for the best rotation to relate two sets of vectors , 1978 .

[40]  D. Eisenberg,et al.  VERIFY3D: assessment of protein models with three-dimensional profiles. , 1997, Methods in enzymology.

[41]  E. Koonin,et al.  Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. , 1999, Journal of molecular biology.