Composite Likelihood Inference in a Discrete Latent Variable Model for Two-Way “Clustering-by-Segmentation” Problems

ABSTRACT We consider a discrete latent variable model for two-way data arrays, which allows one to simultaneously produce clusters along one of the data dimensions (e.g., exchangeable observational units or features) and contiguous groups, or segments, along the other (e.g., consecutively ordered times or locations). The model relies on a hidden Markov structure but, given its complexity, cannot be estimated by full maximum likelihood. Therefore, we introduce a composite likelihood methodology based on considering different subsets of the data. The proposed approach is illustrated by simulation, and with an application to genomic data.

[1]  Prabhani Kuruppumullage Don Estimation and Model Selection for Block Clustering with Mixtures: A Composite Likelihood Approach , 2014 .

[2]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[3]  Francesco Bartolucci,et al.  Latent Markov models: a review of a general framework for the analysis of longitudinal data with covariates , 2014 .

[4]  P. Deb Finite Mixture Models , 2008 .

[5]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[6]  Jan Bulla,et al.  Computational issues in parameter estimation for stationary hidden Markov models , 2008, Comput. Stat..

[7]  H. Akaike INFORMATION THEORY AS AN EXTENSION OF THE MAXIMUM LIKELIHOOD , 1973 .

[8]  Francesco Bartolucci,et al.  Latent Markov Models for Longitudinal Data , 2012 .

[9]  Harry Joe,et al.  Composite Likelihood Methods , 2012 .

[10]  Xin Gao,et al.  Composite Likelihood EM Algorithm with Applications to Multivariate Hidden Markov Model , 2009 .

[11]  C. Varin,et al.  A note on composite likelihood inference and model selection , 2005 .

[12]  Gilles Celeux,et al.  Selecting hidden Markov model state number with cross-validated likelihood , 2008, Comput. Stat..

[13]  William Stafford Noble,et al.  Integrative annotation of chromatin elements from ENCODE data , 2012, Nucleic acids research.

[14]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[15]  N. Reid,et al.  AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS , 2011 .

[16]  Francesco Bartolucci,et al.  Pairwise Likelihood Inference for Nested Hidden Markov Chain Models for Multilevel Longitudinal Data , 2016 .

[17]  Francesca Chiaromonte,et al.  Segmenting the human genome based on states of neutral genetic divergence , 2013, Proceedings of the National Academy of Sciences.

[18]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[21]  Kerrie Mengersen,et al.  Multiple evolutionary rate classes in animal genome evolution. , 2010, Molecular biology and evolution.

[22]  Ruggero Bellio,et al.  A pairwise likelihood approach to generalized linear models with crossed random effects , 2005 .

[23]  Gurpreet S. Sachdeva Mapping and Analysis , 2017 .

[24]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[25]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[26]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[27]  Padhraic Smyth,et al.  Model selection for probabilistic clustering using cross-validated likelihood , 2000, Stat. Comput..

[28]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[29]  D. Cox,et al.  A note on pseudolikelihood constructed from marginal densities , 2004 .

[30]  A. Maruotti Mixed Hidden Markov Models for Longitudinal Data: An Overview , 2011 .

[31]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.