Significance of Interspecies Matches when Evolutionary Rate Varies

We develop techniques to estimate the statistical significance of gap-free alignments between two genomic DNA sequences, using human-mouse alignments as an example. The sequences are assumed to be sufficiently similar that some but not all of the neutrally evolving regions (i.e., those under no evolutionary constraint) can be reliably aligned. Our goal is to model the situation in which the neutral rate of evolution, and hence the extent of the aligning intervals, varies across the genome. In some cases, this permits the weaker of two matches to be judged as less likely to have arisen by chance, provided it lies in a genomic interval with a high level of background divergence. We employ a hidden Markov model to capture variations in divergence rates and assign probability values to gap-free alignments using techniques of Dembo and Karlin, which are related to those used for the same purpose by BLAST. Our methods are illustrated in detail using a 1.49 Mb genomic region. Results obtained from the analysis of human chromosome 22 using these techniques are also provided.

[1]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[2]  W. Miller,et al.  Long human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. , 1997, Genome research.

[3]  B. Roe,et al.  Comparative sequence analysis of 634 kb of the mouse chromosome 16 region of conserved synteny with the human velocardiofacial syndrome region on chromosome 22q11.2. , 2000, Genomics.

[4]  B. Roe,et al.  Comparative sequence of human and mouse BAC clones from the mnd2 region of chromosome 2p13. , 1999, Genome research.

[5]  F. Collins,et al.  Differential phylogenetic footprinting as a means to identify base changes responsible for recruitment of the anthropoid gamma gene to a fetal expression pattern. , 1994, The Journal of biological chemistry.

[6]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[7]  P. Sharp,et al.  Chromosomal location effects on gene sequence evolution in mammals , 1999, Current Biology.

[8]  R. Gibbs,et al.  PipMaker--a web server for aligning two genomic DNA sequences. , 2000, Genome research.

[9]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[10]  G Bernardi,et al.  The mosaic genome of warm-blooded vertebrates. , 1985, Science.

[11]  Amir Dembo,et al.  LIMIT DISTRIBUTIONS OF MAXIMAL SEGMENTAL SCORE AMONG MARKOV-DEPENDENT PARTIAL SUMS , 1992 .

[12]  Wen-Hsiung Li,et al.  Mutation rates differ among regions of the mammalian genome , 1989, Nature.

[13]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[14]  W Miller,et al.  Comparative sequence analysis of the mouse and human Lgn1/SMA interval. , 1999, Genomics.

[15]  P C Hanawalt,et al.  Heterogeneous DNA damage and repair in the mammalian genome. , 1987, Cancer research.

[16]  G Bernardi,et al.  Isochores and the evolutionary genomics of vertebrates. , 2000, Gene.

[17]  B. Koop,et al.  Human and rodent DNA sequence comparisons: a mosaic model of genomic evolution. , 1995, Trends in genetics : TIG.

[18]  T. Boulikas,et al.  Evolutionary consequences of nonrandom damage and repair of chromatin domains , 1992, Journal of Molecular Evolution.

[19]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .