Bayesian Detection of Changepoints in Finite-State Markov Chains for Multiple Sequences

We consider the analysis of sets of categorical sequences consisting of piecewise homogenous Markov segments. The sequences are assumed to be governed by a common underlying process with segments occurring in the same order for each sequence. Segments are defined by a set of unobserved changepoints where the positions and number of changepoints can vary from sequence to sequence. We propose a Bayesian framework for analyzing such data, placing priors on the locations of the changepoints and on the transition matrices and using Markov chain Monte Carlo (MCMC) techniques to obtain posterior samples given the data. Experimental results using simulated data illustrate how the methodology can be used for inference of posterior distributions for parameters and changepoints, as well as the ability to handle considerable variability in the locations of the changepoints across different sequences. We also investigate the application of the approach to sequential data from an application involving monsoonal rainfall patterns. Supplementary materials for this article are available online.

[1]  C. T. Haan,et al.  A Markov Chain Model of daily rainfall , 1976 .

[2]  Jouko Lampinen,et al.  Bayesian Model Assessment and Comparison Using Cross-Validation Predictive Densities , 2002, Neural Computation.

[3]  Upmanu Lall,et al.  Hierarchical Bayesian modeling of multisite daily rainfall occurrence: Rainy season onset, peak, and end , 2009 .

[4]  Richard W. Katz Computing Probabilities Associated with the Markov Chain Model for Precipitation , 1974 .

[5]  P. Fearnhead,et al.  On‐line inference for multiple changepoint problems , 2007 .

[6]  K. Gabriel,et al.  A Markov chain model for daily rainfall occurrence at Tel Aviv , 1962 .

[7]  Alan M. Polansky,et al.  Detecting change-points in Markov chains , 2007, Comput. Stat. Data Anal..

[8]  Stéphane Robin,et al.  Segmentor3IsBack: an R package for the fast and exact segmentation of Seq-data , 2012, Algorithms for Molecular Biology.

[9]  J. Eischeid,et al.  Interannual Variability of the Onset of the Indian Summer Monsoon and Its Association with Atmospheric Features, El Niño, and Sea Surface Temperature Anomalies , 1994 .

[10]  David O Siegmund,et al.  A Modified Bayes Information Criterion with Applications to the Analysis of Comparative Genomic Hybridization Data , 2007, Biometrics.

[11]  P. Guttorp Stochastic modeling of scientific data , 1995 .

[12]  Gregory Nuel,et al.  Fast estimation of posterior probabilities in change-point models through a constrained hidden Markov model , 2012, 1203.4394.

[13]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[14]  P. Webster,et al.  A Hydrological Definition of Indian Monsoon Onset and Withdrawal , 2003 .

[15]  Shi Qiu,et al.  Approximating cross-validatory predictive evaluation in Bayesian latent variable models with integrated IS and WAIC , 2014, Stat. Comput..

[16]  P. V. Joseph,et al.  The summer monsoon onset process over South Asia and an objective method for the date of monsoon onset over Kerala , 2006 .

[17]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[18]  Robert Leconte,et al.  A daily stochastic weather generator for preserving low-frequency of climate variability , 2010 .

[19]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[20]  William G. Cochran An extension of gold's method of examining the apparent persistence of one type of weather , 2007 .

[21]  Y. Guédon Estimating Hidden Semi-Markov Chains From Discrete Sequences , 2003 .

[22]  Benguela Ninos,et al.  Interannual Variability in the , 2010 .

[23]  Aretas A. Saunders,et al.  The Song of the Wood Pewee The Song of the Wood Pewee Myiochanes virens Linnaeus: A Study of Bird Music Wallace Craig , 1944 .

[24]  Richard W. Katz,et al.  Precipitation as a Chain-Dependent Process , 1977 .

[25]  A. Raftery,et al.  Estimation and Modelling Repeated Patterns in High Order Markov Chains with the Mixture Transition Distribution Model , 1994 .

[26]  Roberto Pieraccini,et al.  A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..

[27]  Christophe Godin,et al.  Exploration of a plant architecture database with the AMAPmod software illustrated on an apple tree hybrid family , 1999 .

[28]  Paul Fearnhead,et al.  Exact and efficient Bayesian inference for multiple changepoint problems , 2006, Stat. Comput..

[29]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[30]  A. Berchtold The double chain markov model , 1999 .

[31]  Zhen Liu,et al.  Efficient Bayesian analysis of multiple changepoint models with dependence across segments , 2009, Stat. Comput..

[32]  Haipeng Xing,et al.  A SIMPLE BAYESIAN APPROACH TO MULTIPLE CHANGE-POINTS , 2011 .

[33]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[34]  D. S. Pai,et al.  Summer monsoon onset over Kerala: New definition and prediction , 2009 .

[35]  D. Wilks,et al.  The weather generation game: a review of stochastic weather models , 1999 .

[36]  A. Berchtold High-order extensions of the Double Chain Markov Model , 2002 .

[37]  David Siegmund,et al.  MODEL SELECTION FOR HIGH-DIMENSIONAL, MULTI-SEQUENCE CHANGE-POINT PROBLEMS , 2012 .

[38]  Grégory Nuel,et al.  Fast estimation of posterior probabilities in change-point analysis through a constrained hidden Markov model , 2013, Comput. Stat. Data Anal..

[39]  Christophe Godin,et al.  Measuring and analysing plants with the AMAPmod software , 1997 .

[40]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[41]  Håvard Rue,et al.  Recursive computing and simulation-free inference for general factorizable models , 2007 .

[42]  Stéphane Robin,et al.  Exact posterior distributions and model selection criteria for multiple change-point detection problems , 2012, Stat. Comput..

[43]  Computing a Probability Distribution for the Start of the Rains from a Markov Chain Model for Precipitation , 1982 .

[44]  Aki Vehtari,et al.  Understanding predictive information criteria for Bayesian models , 2013, Statistics and Computing.

[45]  Emilie Lebarbier,et al.  Segmentation of the Poisson and negative binomial rate models: a penalized estimator , 2013, 1301.2534.

[46]  Matthew Fitzpatrick,et al.  Efficient Bayesian estimation of the multivariate Double Chain Markov Model , 2013, Stat. Comput..

[47]  Y. Guédon,et al.  Pattern analysis in branching and axillary flowering sequences. , 2001, Journal of theoretical biology.

[48]  Ying Chen,et al.  Credit Rating Dynamics in the Presence of Unknown Structural Breaks , 2010 .

[49]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .