Bayesian Analysis of Isochores

Statistical identification of isochore structure, the variation in large-scale GC composition (proportion of DNA bases that are G or C as opposed to A or T), of mammalian genomes is a necessary requirement for understanding both the evolution of base composition and the many genomic features such as mutation and recombination rates, which covary with base composition. We have developed a Bayesian method for isochore analysis that we demonstrate to be more accurate than the commonly used binary segmentation approach implemented within the program IsoFinder. The method accounts for both fine-scale and large-scale structure. We adapt direct simulation methods to allow for iid samples from the posterior distribution of our model, and provide an accurate approximation to this that can analyze data from a chromosome in a matter of seconds. We apply our method to human chromosome 1. The resulting estimate of how GC content varies across this region is shown to be a better predictor of local recombination rates than IsoFinder, and we are able to detect regions consistent with the classic definition of isochores that cover 85% of the chromosome. We also show a measure of relative GC content to be particularly predictive of local recombination rates.

[1]  Maximum likelihood estimation of order m for stationary , 1983 .

[2]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[3]  D. Gudbjartsson,et al.  A high-resolution recombination map of the human genome , 2002, Nature Genetics.

[4]  Heikki Mannila,et al.  Genome segmentation using piecewise constant intensity models and reversible jump MCMC , 2002, ECCB.

[5]  Ivo Grosse,et al.  Applications of Recursive Segmentation to the Analysis of DNA Sequences , 2002, Comput. Chem..

[6]  H. Müller,et al.  Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation , 2000 .

[7]  Laurence D. Hurst,et al.  The evolution of isochores , 2001, Nature Reviews Genetics.

[8]  H. Müller,et al.  Statistical methods for DNA sequence segmentation , 1998 .

[9]  P Bernaola-Galván,et al.  Study of statistical correlations in DNA sequences. , 2002, Gene.

[10]  P. Fearnhead,et al.  On‐line inference for multiple changepoint problems , 2007 .

[11]  P Bernaola-Galván,et al.  A simple and species-independent coding measure. , 2002, Gene.

[12]  Darren J. Wilkinson,et al.  Detecting homogeneous segments in DNA sequences by using hidden Markov models , 2000 .

[13]  Adelchi Azzalini,et al.  Maximum likelihood estimation of order m for stationary stochastic processes , 1983 .

[14]  Peter Donnelly,et al.  The Influence of Recombination on Human Genetic Diversity , 2006, PLoS genetics.

[15]  P. Bernaola-Galván,et al.  Compositional segmentation and long-range fractal correlations in DNA sequences. , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[16]  T. Lai,et al.  AUTOREGRESSIVE MODELS WITH PIECEWISE CONSTANT VOLATILITY AND REGRESSION PARAMETERS , 2005 .

[17]  L. Duret,et al.  GC-content evolution in mammalian genomes: the biased gene conversion hypothesis. , 2001, Genetics.

[18]  D G Denison,et al.  Bayesian Partitioning for Estimating Disease Risk , 2001, Biometrics.

[19]  P. Fearnhead,et al.  An exact Gibbs sampler for the Markov‐modulated Poisson process , 2006 .

[20]  Giorgio Bernardi,et al.  An isochore map of human chromosomes. , 2006, Genome research.

[21]  J. Hartigan,et al.  Product Partition Models for Change Point Problems , 1992 .

[22]  Paul Fearnhead,et al.  Exact Bayesian curve fitting and signal segmentation , 2005, IEEE Transactions on Signal Processing.

[23]  David Haussler,et al.  Covariation in frequencies of substitution, deletion, transposition, and recombination during eutherian evolution. , 2003, Genome research.

[24]  Dan Graur,et al.  GC composition of the human genome: in search of isochores. , 2005, Molecular biology and evolution.

[25]  J. Hartigan,et al.  A Bayesian Analysis for Change Point Problems , 1993 .

[26]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[27]  A. Nekrutenko,et al.  Assessment of compositional heterogeneity within and between eukaryotic genomes. , 2000, Genome research.

[28]  W Li,et al.  Delineating relative homogeneous G+C domains in DNA sequences. , 2001, Gene.

[29]  Yi-Ching Yao Estimation of a Noisy Discrete-Time Step Function: Bayes and Empirical Bayes Approaches , 1984 .

[30]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[31]  Ramón Román-Roldán,et al.  Isochore chromosome maps of the human genome. , 2002, Gene.

[32]  Martin J Lercher,et al.  Regional similarities in polymorphism in the human genome extend over many megabases. , 2002, Trends in genetics : TIG.

[33]  P. Donnelly,et al.  A Fine-Scale Map of Recombination Rates and Hotspots Across the Human Genome , 2005, Science.

[34]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[35]  Michael Hackenberg,et al.  IsoFinder: computational prediction of isochores in genome sequences , 2004, Nucleic Acids Res..

[36]  G Bernardi,et al.  Isochores and the evolutionary genomics of vertebrates. , 2000, Gene.

[37]  Jun S. Liu,et al.  Rejection Control and Sequential Importance Sampling , 1998 .

[38]  Paul Fearnhead,et al.  Exact and efficient Bayesian inference for multiple changepoint problems , 2006, Stat. Comput..

[39]  Jun S. Liu,et al.  Bayesian inference on biopolymer models , 1999, Bioinform..

[40]  Isochores merit the prefix 'iso' , 2002, Comput. Biol. Chem..

[41]  Paul Fearnhead,et al.  Computational methods for complex stochastic systems: a review of some alternatives to MCMC , 2008, Stat. Comput..