Enhanced statistics for local alignment of multiple alignments improves prediction of protein function and structure

MOTIVATION Improved comparisons of multiple sequence alignments (profiles) with other profiles can identify subtle relationships between protein families and motifs significantly beyond the resolution of sequence-based comparisons. RESULTS The local alignment of multiple alignments (LAMA) method was modified to estimate alignment score significance by applying a new measure based on Fisher's combining method. To verify the new procedure, we used known protein structures, sequence annotations and cyclical relations consistency analysis (CYRCA) sets of consistently aligned blocks. Using the new significance measure improved the sensitivity of LAMA without altering its selectivity. The program performed better than other profile-to-profile methods (COMPASS and Prof_sim) and a sequence-to-profile method (PSI-BLAST). The testing was large scale and used several parameters, including pseudo-counts profile calculations and local ungapped blocks or more extended gapped profiles. This comparison provides guidelines to the relative advantages of each method for different cases. We demonstrate and discuss the unique advantages of using block multiple alignments of protein motifs.

[1]  Roland L Dunbrack,et al.  Scoring profile‐to‐profile sequence alignments , 2004, Protein science : a publication of the Protein Society.

[2]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[3]  P. Bucher,et al.  Improving the sensitivity of the sequence profile method , 1994, Protein science : a publication of the Protein Society.

[4]  Anna R Panchenko,et al.  Finding weak similarities between proteins by sequence profile comparison. , 2003, Nucleic acids research.

[5]  Nick V. Grishin,et al.  Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments , 2003, Bioinform..

[6]  Kimmen Sjölander,et al.  A comparison of scoring functions for protein sequence profile alignment , 2004, Bioinform..

[7]  N. Grishin,et al.  COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. , 2003, Journal of molecular biology.

[8]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[9]  Nick V. Grishin,et al.  Quality of alignment comparison by COMPASS improves with inclusion of diverse confident homologs , 2004, Bioinform..

[10]  L. Hedges,et al.  Statistical Methods for Meta-Analysis , 1987 .

[11]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[12]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[13]  Golan Yona,et al.  Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. , 2002, Journal of molecular biology.

[14]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[15]  Jorja G. Henikoff,et al.  Using substitution probabilities to improve position-specific scoring matrices , 1996, Comput. Appl. Biosci..

[16]  E. Sitbon,et al.  Consistency analysis of similarity between multiple alignments: prediction of protein function and fold structure from analysis of local sequence motifs. , 2001, Journal of molecular biology.

[17]  Osamu Gotoh,et al.  Optimal alignment between groups of sequences and its application to multiple sequence alignment , 1993, Comput. Appl. Biosci..

[18]  S. Henikoff,et al.  Automated assembly of protein blocks for database searching. , 1991, Nucleic acids research.

[19]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[20]  W. Taylor A flexible method to align large numbers of biological sequences , 2005, Journal of Molecular Evolution.

[21]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[22]  S. Pietrokovski Searching databases of conserved sequence regions by aligning protein multiple-alignments. , 1996, Nucleic acids research.

[23]  S. Altschul,et al.  Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. , 1994, Proceedings of the National Academy of Sciences of the United States of America.