Scoring two-species local alignments to try to statistically separate neutrally evolving from selected DNA segments

We construct several score functions for use in locating unusually conserved regions in a genome-wide search of aligned DNA from two species. We test these functions on regions of the human genome aligned to the mouse genome. These score functions are derived from properties of neutrally evolving sites on the mouse and human genome, and can be adjusted to the local background rate of conservation. The aim of these functions is to try to identify regions of the human genome that are conserved by evolutionary selection, because they have an important function, rather than by chance. We use them to get a very rough estimate of the amount of DNA in the human genome that is under selection.

[1]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[2]  B. Rannala,et al.  Phylogenetic methods come of age: testing hypotheses in an evolutionary context. , 1997, Science.

[3]  David Haussler,et al.  Combining Phylogenetic and Hidden Markov Models in Biosequence Analysis , 2004, J. Comput. Biol..

[5]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[6]  Gary D. Stormo,et al.  Modeling Regulatory Networks with Weight Matrices , 1998, Pacific Symposium on Biocomputing.

[7]  P. Sharp,et al.  Chromosomal location effects on gene sequence evolution in mammals , 1999, Current Biology.

[8]  W. Miller,et al.  Distinguishing regulatory DNA from neutral sites. , 2003, Genome research.

[9]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[10]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[11]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[12]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[13]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[14]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[15]  B. Morton The Influence of Neighboring Base Composition on Substitutions in Plant Chloroplast Coding Sequences , 1997 .

[16]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[17]  S. Hess,et al.  The influence of nearest neighbors on the rate and pattern of spontaneous point mutations , 1992, Journal of Molecular Evolution.

[18]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .