Approximate P-Values for Local Sequence Alignments: Numerical Studies

Siegmund and Yakir (2000) have given an approximate p-value when two independent, identically distributed sequences from a finite alphabet are optimally aligned based on a scoring system that rewards similarities according to a general scoring matrix and penalizes gaps (insertions and deletions). The approximation involves an infinite sequence of difficult-to-compute parameters. In this paper, it is shown by numerical studies that these reduce to essentially two numerically distinct parameters, which can be computed as one-dimensional numerical integrals. For an arbitrary scoring matrix and affine gap penalty, this modified approximation is easily evaluated. Comparison with published numerical results show that it is reasonably accurate.

[1]  A. Dembo,et al.  Limit Distribution of Maximal Non-Aligned Two-Sequence Segmental Score , 1994 .

[2]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[3]  W. Pearson Comparison of methods for searching protein sequence databases , 1995, Protein science : a publication of the Protein Society.

[4]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[5]  Amir Dembo,et al.  LIMIT DISTRIBUTIONS OF MAXIMAL SEGMENTAL SCORE AMONG MARKOV-DEPENDENT PARTIAL SUMS , 1992 .

[6]  M. Woodroofe Nonlinear Renewal Theory in Sequential Analysis , 1987 .

[7]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[8]  Richard Mott,et al.  Approximate Statistics of Gapped Alignments , 1999, J. Comput. Biol..

[9]  D. Siegmund Sequential Analysis: Tests and Confidence Intervals , 1985 .

[10]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[11]  S. Altschul,et al.  The estimation of statistical parameters for local alignment score distributions. , 2001, Nucleic acids research.

[12]  A. B. Robinson,et al.  Distribution of glutamine and asparagine residues and their near neighbors in peptides and proteins. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[13]  David Siegmund,et al.  The maximum of a function of a Markov chain and application to linkage analysis , 1999, Advances in Applied Probability.

[14]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[15]  L. Gordon,et al.  Two moments su ce for Poisson approx-imations: the Chen-Stein method , 1989 .

[16]  M. Waterman,et al.  The Erdos-Renyi Law in Distribution, for Coin Tossing and Sequence Matching , 1990 .

[17]  Michael S. Waterman,et al.  Introduction to Computational Biology: Maps, Sequences and Genomes , 1998 .

[18]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[19]  A new representation for a renewal-theoretic constant appearing in asymptotic approximations of large deviations , 1998 .

[20]  S. Asmussen Risk theory in a Markovian environment , 1989 .

[21]  P. Ney,et al.  Limit Theorems for Semi-Markov Processes and Renewal Theory for Markov Chains , 1978 .

[22]  Martin Vingron,et al.  Sequence Comparison Significance and Poisson Approximation , 1994 .

[23]  Benjamin Yakir,et al.  Approximate p-values for local sequence alignments , 2000 .