Significance Of inter-species matches when evolutionary rate varies

We develop techniques to estimate the statistical significance of gap-free alignments between two genomic DNA sequences, using human-mouse alignments as an example. The sequences are assumed to be sufficiently similar that some but not all of the neutrally evolving regions (i.e., those under no evolutionary constraint) can be reliably aligned. Our goal is to model the situation in which the neutral rate of evolution, and hence the extent of the aligning intervals, varies across the genome. In some cases, this permits the weaker of two matches to be judged as less likely to have arisen by chance, provided it lies in a genomic interval with a high level of background divergence. We employ a Hidden Markov Model to capture variations in divergence rates, and assign probability values to gap-free alignments using techniques related to those used for the same purpose by Blast. Our methods are illustrated in detail using a 1.49 Mb genomic region. Preliminary results using all of human chromosome 22 indicate that these techniques will work for the entire human genome.

[1]  Webb Miller,et al.  Generation and Comparative Analysis of ∼3.3 Mb of Mouse Genomic Sequence Orthologous to the Region of Human Chromosome 7q11.23 Implicated in Williams Syndrome , 2002 .

[2]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[3]  T. Boulikas,et al.  Evolutionary consequences of nonrandom damage and repair of chromatin domains , 1992, Journal of Molecular Evolution.

[4]  Pavel A. Pevzner,et al.  Parametric Recomuting in Alignment Graphs , 1994, CPM.

[5]  Francesca Chiaromonte,et al.  Scoring Pairwise Genomic Sequence Alignments , 2001, Pacific Symposium on Biocomputing.

[6]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[7]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[8]  R. Gibbs,et al.  PipMaker--a web server for aligning two genomic DNA sequences. , 2000, Genome research.

[9]  P. Sharp,et al.  Chromosomal location effects on gene sequence evolution in mammals , 1999, Current Biology.

[10]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[11]  F. Collins,et al.  Differential phylogenetic footprinting as a means to identify base changes responsible for recruitment of the anthropoid gamma gene to a fetal expression pattern. , 1994, The Journal of biological chemistry.

[12]  Wen-Hsiung Li,et al.  Mutation rates differ among regions of the mammalian genome , 1989, Nature.

[13]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[14]  Amir Dembo,et al.  LIMIT DISTRIBUTIONS OF MAXIMAL SEGMENTAL SCORE AMONG MARKOV-DEPENDENT PARTIAL SUMS , 1992 .

[15]  B. Roe,et al.  Comparative sequence analysis of 634 kb of the mouse chromosome 16 region of conserved synteny with the human velocardiofacial syndrome region on chromosome 22q11.2. , 2000, Genomics.

[16]  W Miller,et al.  Comparative sequence analysis of the mouse and human Lgn1/SMA interval. , 1999, Genomics.

[17]  G Bernardi,et al.  The mosaic genome of warm-blooded vertebrates. , 1985, Science.

[18]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[19]  B. Koop,et al.  Human and rodent DNA sequence comparisons: a mosaic model of genomic evolution. , 1995, Trends in genetics : TIG.

[20]  W. Miller,et al.  Long human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. , 1997, Genome research.

[21]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[22]  B. Roe,et al.  Comparative sequence of human and mouse BAC clones from the mnd2 region of chromosome 2p13. , 1999, Genome research.

[23]  G Bernardi,et al.  Isochores and the evolutionary genomics of vertebrates. , 2000, Gene.

[24]  P C Hanawalt,et al.  Heterogeneous DNA damage and repair in the mammalian genome. , 1987, Cancer research.