论文信息 - Applications of the Scan Statistic in DNA Sequence Analysis

Applications of the Scan Statistic in DNA Sequence Analysis

Advances of biochemical techniques have made available large databases of long DNA sequences. These sequences reflect conglomerates of random and nonrandom letter strings from the nucleotide alphabet {A, C, G, T}. As the databases expand, mathematical methods play an increasingly important role in analyzing and interpreting the rapidly accumulating DNA data. In this chapter, we discuss a specific example of identifying nonrandom clusters of palindromes in a family of herpesvirus genomes using the r-scan statistic. Palindrome positions on the genome are modeled by i.i.d. random variables uniformly distributed on the unit interval (0,1). After a comparison of three Poisson-type approximations, the r-scan distribution is computed by a compound Poisson approximation proposed by Glaz (1994). Some of the significant palindrome clusters are located at genome regions containing origins of replication and regulatory signals of the herpesviruses.

Ming-Ying Leung | Traci E. Yamashita

[1] Joseph Glaz,et al. Approximations and Bounds for the Distribution of the Scan Statistic , 1989 .

[2] Mark Berman,et al. A Useful Upper Bound for the Tail Probabilities of the Scan Statistic When the Sample Size is Large , 1985 .

[3] R. Doolittle. Molecular evolution: computer analysis of protein and nucleic acid sequences. , 1990, Methods in enzymology.

[4] S Karlin,et al. Human cytomegalovirus origin of DNA replication (oriLyt) resides within a highly complex repetitive region. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[5] S. Karlin,et al. A second course in stochastic processes , 1981 .

[6] S. Karlin,et al. Frequent oligonucleotides and peptides of the Haemophilus influenzae genome. , 1996, Nucleic acids research.

[7] Compound poisson approximations for the numbers of extreme spacings , 1993, Advances in Applied Probability.

[8] D. Aldous. Probability Approximations via the Poisson Clumping Heuristic , 1988 .

[9] Joseph Naus,et al. Poisson approximations for the distribution and moments of ordered m -spacings , 1994 .

[10] B E Griffin,et al. Epstein-Barr virus in epithelial cell tumors: a breast cancer study. , 1995, Cancer research.

[11] S Karlin,et al. An efficient algorithm for identifying matches with errors in multiple long molecular sequences. , 1991, Journal of molecular biology.