Genome segmentation using piecewise constant intensity models and reversible jump MCMC

The existence of whole genome sequences makes it possible to search for global structure in the genome. We consider modeling the occurrence frequencies of discrete patterns (such as starting points of ORFs or other interesting phenomena) along the genome. We use piecewise constant intensity models with varying number of pieces, and show how a reversible jump Markov Chain Monte Carlo (RJMCMC) method can be used to obtain a posteriori distribution on the intensity of the patterns along the genome. We apply the method to modeling the occurrence of ORFs in the human genome. The results show that the chromosomes consist of 5-35 clearly distinct segments, and that the posteriori number and length of the segments shows significant variation. On the other hand, for the yeast genome the intensity of ORFs is nearly constant.

[1]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[2]  Jan Paces,et al.  A compact view of isochores in the draft human genome sequence , 2002, FEBS letters.

[3]  K. Bennett,et al.  Determination of the number of zones in a biostratigraphical sequence. , 1996, The New phytologist.

[4]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[5]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[6]  R. Waagepetersen,et al.  A Tutorial on Reversible Jump MCMC with a View toward Applications in QTL‐mapping , 2001 .

[7]  Heikki Mannila,et al.  Finding simple intensity descriptions from event sequence data , 2001, KDD '01.

[8]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[9]  N. B. Booth,et al.  A Bayesian approach to retrospective identification of change-points , 1982 .

[10]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[11]  Jun S. Liu,et al.  Bayesian inference on biopolymer models , 1999, Bioinform..

[12]  Elja Arjas,et al.  Survival models and martingale dynamics , 1989 .

[13]  P. Guttorp Stochastic modeling of scientific data , 1995 .

[14]  Mikhail A. Roytberg,et al.  DNA Segmentation Through the Bayesian Approach , 2000, J. Comput. Biol..

[15]  K. Ramchandran,et al.  Flexible time segmentations for time-varying wavelet packets , 1994, Proceedings of IEEE-SP International Symposium on Time- Frequency and Time-Scale Analysis.