Predicting reliable regions in protein sequence alignments

MOTIVATION Protein sequence alignments have a myriad of applications in bioinformatics, including secondary and tertiary structure prediction, homology modeling, and phylogeny. Unfortunately, all alignment methods make mistakes, and mistakes in alignments often yield mistakes in their application. Thus, a method to identify and remove suspect alignment positions could benefit many areas in protein sequence analysis. RESULTS We tested four predictors of alignment position reliability, including near-optimal alignment information, column score, and secondary structural information. We validated each predictor against a large library of alignments, removing positions predicted as unreliable. Near-optimal alignment information was the best predictor, removing 70% of the substantially-misaligned positions and 58% of the over-aligned positions, while retaining 86% of those aligned accurately.

[1]  J. M. Sauder,et al.  Large‐scale comparison of protein sequence alignment algorithms with structure alignments , 2000, Proteins.

[2]  P. Argos,et al.  Determination of reliable regions in protein sequence alignments. , 1990, Protein engineering.

[3]  Joaquín Dopazo A new index to find regions showing an unexpected variability or conservation in sequence alignments , 1997, Comput. Appl. Biosci..

[4]  M Levitt,et al.  Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins , 1998, Protein science : a publication of the Protein Society.

[5]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[6]  A. Murzin Structure classification‐based assessment of CASP3 predictions for the fold recognition targets , 1999, Proteins.

[7]  T G Marr,et al.  Alignment of molecular sequences seen as random path analysis. , 1995, Journal of theoretical biology.

[8]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[9]  M Vingron,et al.  Near-optimal sequence alignment. , 1996, Current opinion in structural biology.

[10]  Richard Hughey,et al.  Optimizing reduced-space sequence analysis , 2000, Bioinform..

[11]  M. Koda Neural network learning based on stochastic sensitivity analysis , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[12]  Piotr Berman,et al.  Post-processing long pairwise alignments , 1999, Bioinform..

[13]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[14]  Christophe G. Lambert,et al.  Comparative analysis of seven multiple protein sequence alignment servers: clues to enhance reliability of predictions , 1998, Bioinform..

[15]  David Haussler,et al.  Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology , 1996, Comput. Appl. Biosci..

[16]  Anders Krogh,et al.  Hidden Markov models for sequence analysis: extension and analysis of the basic method , 1996, Comput. Appl. Biosci..

[17]  Ian Holmes,et al.  Dynamic Programming Alignment Accuracy , 1998, J. Comput. Biol..

[18]  S H Bryant,et al.  A retrospective analysis of CASP2 threading predictions , 1997, Proteins.

[19]  David C. Jones,et al.  Progress in protein structure prediction. , 1997, Current opinion in structural biology.

[20]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[21]  M. Vingron,et al.  Quantifying the local reliability of a sequence alignment. , 1996, Protein engineering.

[22]  Lihua Yu,et al.  Positional Statistical Significance in Sequence Alignment , 1999, J. Comput. Biol..

[23]  S. Miyazawa A reliable sequence alignment method based on probabilities of residue correspondences. , 1995, Protein engineering.

[24]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[25]  Chris Sander,et al.  Dali/FSSP classification of three-dimensional protein folds , 1997, Nucleic Acids Res..

[26]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.